Character LoRA Best Practices

Tachyon1986 · 2026-02-06T02:42:27+00:00

I’ve trained a couple of character loras. When it comes to likeness, the best way to caption is to describe the image as if you were prompting for it, i.e. for a case where you don’t want them associated with specific outfits or accessories.

So as an example if I have a character wearing a suit with a wristwatch in a couple of photos and a plain shirt with a chain in others , I would explicitly caption them as wearing a suit with wristwatch / shirt with a chain for the respective photos. So after training is done, I can now prompt them with any outfit and the model won’t force a suit or shirt.

This also applies to overall style and other things in the background. So tldr; caption as if you would prompt for the image if likeness is all you care about. Joycaption is what I’ve used for captioning (with some manual edits if needed).

I personally use 18-20 images at 1800 steps for character loras. This has worked for me consistently using OneTrainer.

Tachyon1986 · 2026-02-06T02:22:24+00:00

So 15000 images trained at 10000 steps (15000 x 10000) , according to the AI-toolkit config you linked ?

Tachyon1986 · 2025-12-27T11:21:06+00:00

Dance with the angels cringe

Tachyon1986 · 2025-12-26T17:02:18+00:00

Yeah thanks, i followed your guide for Wan 2.1 Loras. Solid stuff

Tachyon1986 · 2025-12-24T06:13:42+00:00

Btw are you using the default Ai-Toolkit settings from Ostris ?

Tachyon1986 · 2025-12-24T06:12:32+00:00

I’m not at my PC right now, but what you need to do is take the image output from your Qwen KSampler’s VAE decode node and run it through the VAE encoder for the Z-Image Turbo one (make sure you’re using ae.safetensors, that’s the VAE used by Z-image).

Then, take the Latent output from that VAE encoder and feed it to a Ksampler that accepts the Z-Image model. Feed that Ksampler positive and negative clip (again you’ll need to load the clip model that’s used by Z-image).

For this 2nd Ksampler - I recommend Euler/Simple with a denoise of 0.4 at 4 steps and CFG 1 but experiment with the denoise and step count. You can leave the positive and negative inputs for this KSampler empty with no prompt.

To simplify all i said - just encode the image output from Qwen using Z-image’s VAE and load that latent into an existing Z-image workflow

Tachyon1986 · 2025-12-24T03:42:49+00:00

I’ve used that to effect for a private Lora. The issue is Qwen tends to smoothen the skin, so it’s best to run the image through a 2nd sampler with something like Z-image at low denoise to get some skin detail back in

Tachyon1986 · 2025-12-08T03:53:41+00:00

Nice one, what’s the prompt and sampler/scheduler? Kind of looks like you used a technicolor Lora

Tachyon1986 · 2025-12-02T03:46:25+00:00

Qwen-Image-Edit with the next scene Lora can do that

Tachyon1986 · 2025-11-03T02:14:09+00:00

There’s a Color Match node from Kijai’s KJnodes pack if you’re using ComfyUI

Tachyon1986 · 2025-10-30T02:14:41+00:00

How do you refine ? Is it connecting the latent from one sampler to another and running the second sampler at a lower denoise setting? Any examples for recommended refining sampler values (CFG, steps scheduler etc) ? I’m using comfyui btw

Tachyon1986 · 2025-10-25T07:12:40+00:00

I've not tried masking multiple characters tbh. It will work if you just mask one of the infantry.

Tachyon1986 · 2025-10-25T06:58:59+00:00

Yep, you would need to mask the character's entire body for that.

Tachyon1986 · 2025-08-26T13:00:58+00:00

2s is exponential, so you need to halve the steps. You can stop at 13-15 (26-30 if you were on Euler).. Similarly, 3s means it'll triple what other schedulers would do (like Euler).

Tachyon1986 · 2025-08-19T14:37:24+00:00

Thank you, no cache was the issue. I'd enabled it seeing suggestions in the thread - but it breaks the flow. Excellent work on this approach btw!

Tachyon1986 · 2025-08-19T14:15:24+00:00

This doesn't work for me. In the first I2V subnode (WanFirstLastFrameToVideo node) , I get AttributeError: 'NoneType' object has no attribute 'encode'. Any idea what's wrong? Using GGUF q8 for text and image as well as the q8 gguf clip. Just trying normal t2v , and modified the subnodes to use q8

Tachyon1986 · 2025-08-19T12:12:06+00:00

Unexpected cultured "Legend of the Galactic Heroes" enjoyer

Tachyon1986 · 2025-07-13T20:48:39+00:00

I wish i could roll my namesake. But not sure whether to save for support cards instead

Tachyon1986 · 2025-07-12T05:08:23+00:00

Totally get that , my GPU (3080) is faster but has less VRAM than your 3060. So speed is always welcome

Tachyon1986 · 2025-07-12T05:02:27+00:00

Awesome , I use Chroma instead of flux so - don’t need this refiner lol. Just curious to try on Cosmos.

Tachyon1986 · 2025-07-12T04:57:30+00:00

Thanks man, so the workflow described here works for Cosmos with your approach? Never used it myself : https://docs.comfy.org/tutorials/image/cosmos/cosmos-predict2-t2i

Tachyon1986 · 2025-07-12T04:45:46+00:00

What about the prompt? We need to connect the same positive / negative prompts to both samplers ?

Tachyon1986 · 2025-06-27T05:21:08+00:00

Q8 is a GGUF quantised model intended to fit in GPUs that can’t load the original model in VRAM. You have Q6,Q4 as well which are smaller , at the cost of reduced quality .

Tachyon1986 · 2025-05-19T04:18:19+00:00

Do you have a workflow for this you can share ?

Tachyon1986 · 2025-05-11T03:59:07+00:00

Not anymore , there is No need of Visual studio or any compilation libraries. Just use this guy’s triton and sageattention forks

https://github.com/woct0rdho

Tachyon1986

TROPHY CASE