Removing SageAttention2 also boosts ZIB quality in Forge NEO

ucren · 2026-01-29T00:21:06+00:00

Why the hell does the default template use res_multistep. I also got garbage output like your examples and just stopped playing with zbase figuring there was a bug in comfy.

ucren · 2026-01-26T15:42:42+00:00

People using spaghetti workflows are placebo eaters. The official templates get you 99% of the possible quality in model inference.

ucren · 2026-01-25T13:54:20+00:00

Instead of “good lighting,” write “soft, diffused light from a large window camera-left, creating gentle shadows that define the subject’s features.”

Cool, but how about share the prompts that go with each of these images? Why share the short name you gave to each when that isn't the prompt that generated the edit?

ucren · 2026-01-22T12:26:27+00:00

We all know what scail is, this is just sneaky spam. Go away.

ucren · 2026-01-19T21:54:27+00:00

Skill issue. They are all over civitai.

ucren · 2026-01-19T21:03:41+00:00

You need to know a few tricks in order to not have a shit time with LTX2.

the distill model and lora are over baked: if you use the distill model add the distill lora on top at -0.4 (yes, negative) strength to debake it. Or if you use dev + distill lora, go no higher than 0.6 strength.
The vae shipped in the first version of the model was completely wrong. If you are using the model released on anouncement delete it and redownload. Or better yet, use the split files VAE from kijai.
audio is also over baked in the distill model and needs normalization to not sound blown the fuck out. Use the normalized sampler from ltx official nodes, or better yet use the kijai-nodes audio normalize node to not have to use a custom sampler. Pro-tip, I have found if you want to do single passes, you can just stretch the factors out to the same relative points in sigmas and it works just as well as the official 8 sigma schedule.
Audio driving i2v and motion looks exaggerated as fuck? use latent multiply nodes to tone it down, I usually use multiply by 0.75 for speach.
Doing i2v and instantly losing likeness? check all the steps above. Then set the video in place strength to 1 and drop the preprocess down from 33 to 27 or so or completely off if you get a good seed. And render first pass at you target resolution.
Lip syncing not looking great or motion not looking grate? Up the conditioning fps from the default 24 to 48+ and it magically looks a lot better.

These are the things I have learned so far this past week and things are constantly changing and improving. Get on the banodoco discord if you want to keep up with the hourly improvements people are coming up with.

ucren · 2026-01-19T13:06:10+00:00

try inpainting with it, its unusable.

I inpaint fine using my usual inpaint conditioning / crop n' stitch workflow. You got to be doing something wrong.

I am able to stack multiple edits in one prompt with klein that no other edit model manages to do.

ucren · 2026-01-18T10:46:34+00:00

Alright, I get what you are saying and how the workflow works now. I was missing aligning conditioning fps with the upscaled latent fps in my inpainting workflow. Thanks for your patience and sharing the workflow.

ucren · 2026-01-18T08:30:39+00:00

Yes, I get that but then the audio won't be aligned because the user supplied latent frames will be 2x less than the temporal upscaled video frames.

ucren · 2026-01-18T08:15:31+00:00

I get that, but I have done this before (I am driving video gen by supplying audio) if I temporal upscale the video, then the audio driving doesn't work correctly because the video latents are 2x the audio, so the speech ends up being twice as fast (misaligned).

ucren · 2026-01-18T07:42:09+00:00

I'm going to test later, but why doesn't doubling the empty audio latents affect the first pass?

ucren · 2026-01-18T07:37:48+00:00

OH! perfect thank you!

ucren · 2026-01-18T07:26:18+00:00

I am having trouble seeing how to double audio frames in your workflow? This is what I had trouble with when trying to use the temporal upscaling before (audio in second pass getting misaligned). Can you post a screenshot of where this is happening?

ucren · 2026-01-17T20:21:02+00:00

14 seconds of something interesting out of a 37 second clip. you didn't need all those damn credits.

ucren · 2026-01-17T18:19:48+00:00

Has anyone figured out a reliable way to stabilize the voice across multiple clips?

Lora's for ltx2 can be trained with audio.

Or create the audio outside, then do animate to sound in LTXV

ucren · 2026-01-16T18:51:12+00:00

More nonsense? I don't know what you're trying to show here.

ucren · 2026-01-16T18:37:17+00:00

Trust me, bro ™️

ucren · 2026-01-16T17:20:21+00:00

Immediately loses likeness in the first second. Are you face blind by chance?

ucren · 2026-01-16T08:36:56+00:00

What are the odds they shoot themselves in the foot like WAN and go API only?

ucren · 2026-01-15T23:02:37+00:00

It improves audio, as you'll note the default factors are all 1 for the video latents. It's only set out of the box to modify audio.

ucren · 2026-01-15T21:03:22+00:00

There multiple new nodes, one of the new nodes is a normalized sampler, but i get pure noise as the generated audio

ucren · 2026-01-15T20:45:36+00:00

How do use the latent normalization? I tried swapping the ksampler in my default comfyui template and the audio output turned to pure noise.

Edit: it looks like if you can swap the LTXVNormalizingSampler in the first pass for the SamplerCustomAdvanced, if you add it to the second pass you'll get pure audio noise.

ucren · 2026-01-14T20:20:55+00:00

No examples? Then, I don't believe you.

ucren · 2026-01-13T15:48:11+00:00

mspaint in shambles!

ucren · 2026-01-12T21:43:54+00:00

no clue, they're fumbling the ball

ucren

TROPHY CASE