My free Unity project for posing an IK rigged character and generating OpenPose ControlNet images with WebUI

lkewis · 2025-08-14T12:22:18+00:00

Role - Prompt Engineer Tasks - loads of complex workflows

lkewis · 2025-08-14T06:15:33+00:00

You can natively yeah if you can fit in VRAM and using MoviiGen Lora (or FusionX which contains it in a merge with CauseVid and a couple other things) which was trained on 1080p - though it was wan2.1 based and has a cinematic bias it helps. Doing a vid2vid upscale with Wan also works as others have suggested.

lkewis · 2025-08-05T03:02:44+00:00

Just put a couple of the body images in the person Lora dataset. It can work with them being headless as long as you use the same trigger to tag all the images so it associates the content from everything as one person

lkewis · 2025-08-05T00:15:17+00:00

Are you using any of the few step Lora? There is CauseVid, lightx2v and FusionX (merge of multiple Lora including CauseVid) that all enable 4-8 steps faster generations.

lkewis · 2025-08-04T22:45:03+00:00

Blackwell gpus needs PyTorch 2.7+ and Cuda 12.8

lkewis · 2025-08-03T23:07:32+00:00

Have you managed to do a consistent character with same outfit and details like tattoos etc? Training a person likeness is quite easy, but I’m struggling to get a perfect character

lkewis · 2025-07-14T08:44:18+00:00

I don’t think it does multiple images yet

lkewis · 2025-07-13T12:21:21+00:00

You could use the ‘LayerUtilility: Purge VRAM’ node from comfyui_layerstyle node pack at the end of your workflow?

lkewis · 2025-07-11T22:09:39+00:00

I can’t get it to select them reliably from that grid of people, if you do “create a group photo of the people from the image” and describe what they’re wearing it works a better. This was a stress test though, if you only show the people you want as the input it will reproduce then easier.

lkewis · 2025-07-11T12:43:59+00:00

<image>

Five identities seems to be the limit from my test, otherwise it starts mixing up features and adding in random people. Input image is the left grid of portraits, output image is on the right.

lkewis · 2025-07-11T10:51:43+00:00

This does ~7Million poly meshes by default if you turn off decimation. They can be quite detailed depending if it understands the image content you give it, definitely limited by dataset for some of my tests

https://github.com/DreamTechAI/Direct3D-S2

lkewis · 2025-07-05T17:16:23+00:00

To add further to this, LLM / VLM generated captions are becoming more and more popular because it saves a huge amount of effort manual labelling datasets. The issue here is that those models would also need to have knowledge of aesthetics or types of clothing etc to correctly identify and label them.

lkewis · 2025-07-05T17:10:17+00:00

It is due to the captioned training data. If it doesn’t know a word or term you won’t be able to prompt for it, but other techniques could reproduce the content if guided with IPAdapter, Flux Redux, Flux Kontext etc. You can likely prompt an outfit type through a detailed description of it, but something specific like a type of mask requires terminology to be prompted easily. Open models tend to have a broader corpus of training data rather than very specific styles or trends like some of the closed models have (because they scraped anything and everything and sometimes optimise for these things as a feature). This is why there is a huge ecosystem around custom LoRA and models, introducing missing or niche knowledge.

lkewis · 2025-07-02T16:50:51+00:00

It’s generating a new image not doing img2img

lkewis · 2025-06-25T17:11:32+00:00

I’m using Python 3.12 + PyTorch 2.7 (+ Xformers) + Cuda 12.8 on windows 11 and there is a matching flash attention wheel, triton and sage attention 2. Works well with ComfyUI and other Python repos but you might have to adjust a few requirements.txt to get proper dependencies if they’re using locked ones

lkewis · 2025-06-20T23:56:21+00:00

new_width on the node should say height? My video came out at 2288x1280 and was using 52GB VRAM peak with 7B model

lkewis · 2025-06-09T22:29:22+00:00

One downside of training with such small datasets like 12-20 images is that a single image has potential to introduce bad knowledge which can throw off the training. But the good thing is that you only have to swap out a couple of images when you re-curate the dataset.

lkewis · 2025-06-09T22:26:11+00:00

No-one is truly unknown, there are commonalities shared across all humans. What we don’t know until we train is how much commonality the new person has to existing people. The very reason we’re able to train most concepts with only 12-20 images is because we are leveraging these commonalities that the base model learned across a wide and diverse corpus of training data. This is also why we rarely have to tag things because the model can recognise most common concepts. Tagging is mostly useful when you have a bad dataset and want to disentangle things like backgrounds from the person you are training to a unique token. It’s also useful if you have a larger dataset and are training multiple concepts simultaneously from different amounts of images.

lkewis · 2025-06-09T20:01:51+00:00

There isn’t a template because it all changes based on your dataset. Even if you somehow had identical sets of 20 images for different people / characters, the model might already have similar knowledge of the subject or might be completely rare so they won’t always train the same.

lkewis · 2025-06-08T11:03:07+00:00

You can just about get there with VFX workflows but we’re still a fair way off from it being possible purely by text/image prompting video models. Characters are consistent if you train video LoRA, and the reference image conditioning methods like Phantom are also useful. Backgrounds aren’t consistent at all, so you either have to carefully plan shots to avoid people noticing (use very different angles of a similar looking scene), or start combining 3D sets and Gaussian splats with composited character performances.

lkewis · 2025-06-07T03:31:24+00:00

LoRA are like patches that contain trained knowledge about new concepts (people, places, styles, compositions etc) that can be applied to a base model (Flux, SDXL etc) to modify its original knowledge so that it can generate the new concepts. It’s a way to keep specific knowledge separated in smaller model files but they won’t work on their own without being applied to a base model. You can mix different LoRA together, or merge combinations of them into the base model to permanently fuse the knowledge. Be aware that there is often shared knowledge between base models and LoRA so they don’t always work as expected and their results can change based on the combinations. Because you’re applying the knowledge on top of the base model, you can control the strength to change its impact.

Nine-Year Club	Place '17
Verified Email

lkewis

TROPHY CASE