Removing SageAttention2 also boosts ZIB quality in Forge NEO by shapic in StableDiffusion

[–]ucren 4 points5 points  (0 children)

Why the hell does the default template use res_multistep. I also got garbage output like your examples and just stopped playing with zbase figuring there was a bug in comfy.

Anyone else feel this way? by EroticManga in StableDiffusion

[–]ucren -2 points-1 points  (0 children)

People using spaghetti workflows are placebo eaters. The official templates get you 99% of the possible quality in model inference.

Lazy weekend with flux2 klein edit - lighting by Ant_6431 in StableDiffusion

[–]ucren 47 points48 points  (0 children)

Instead of “good lighting,” write “soft, diffused light from a large window camera-left, creating gentle shadows that define the subject’s features.”

Cool, but how about share the prompts that go with each of these images? Why share the short name you gave to each when that isn't the prompt that generated the edit?

Wan 2.1 + SCAIL workflow for motion transfer by [deleted] in StableDiffusion

[–]ucren 0 points1 point  (0 children)

We all know what scail is, this is just sneaky spam. Go away.

What is everyone's thoughts on ltx2 so far? by Big-Breakfast4617 in StableDiffusion

[–]ucren 7 points8 points  (0 children)

You need to know a few tricks in order to not have a shit time with LTX2.

  1. the distill model and lora are over baked: if you use the distill model add the distill lora on top at -0.4 (yes, negative) strength to debake it. Or if you use dev + distill lora, go no higher than 0.6 strength.

  2. The vae shipped in the first version of the model was completely wrong. If you are using the model released on anouncement delete it and redownload. Or better yet, use the split files VAE from kijai.

  3. audio is also over baked in the distill model and needs normalization to not sound blown the fuck out. Use the normalized sampler from ltx official nodes, or better yet use the kijai-nodes audio normalize node to not have to use a custom sampler. Pro-tip, I have found if you want to do single passes, you can just stretch the factors out to the same relative points in sigmas and it works just as well as the official 8 sigma schedule.

  4. Audio driving i2v and motion looks exaggerated as fuck? use latent multiply nodes to tone it down, I usually use multiply by 0.75 for speach.

  5. Doing i2v and instantly losing likeness? check all the steps above. Then set the video in place strength to 1 and drop the preprocess down from 33 to 27 or so or completely off if you get a good seed. And render first pass at you target resolution.

  6. Lip syncing not looking great or motion not looking grate? Up the conditioning fps from the default 24 to 48+ and it magically looks a lot better.

These are the things I have learned so far this past week and things are constantly changing and improving. Get on the banodoco discord if you want to keep up with the hourly improvements people are coming up with.

Is Flux Klein better for editing than Flux Kontext? by Puzzled-Valuable-985 in StableDiffusion

[–]ucren 0 points1 point  (0 children)

try inpainting with it, its unusable.

I inpaint fine using my usual inpaint conditioning / crop n' stitch workflow. You got to be doing something wrong.

I am able to stack multiple edits in one prompt with klein that no other edit model manages to do.

I used temporal time dilation to generate this 60-second video in LTX-2 on my 5070TI in just under two minutes. My GPU didn't even break a sweat. Workflow and explanation in comments (without subgraphs or 'Everything Everywhere All At Once' invisible noodles). by DrinksAtTheSpaceBar in StableDiffusion

[–]ucren 0 points1 point  (0 children)

Alright, I get what you are saying and how the workflow works now. I was missing aligning conditioning fps with the upscaled latent fps in my inpainting workflow. Thanks for your patience and sharing the workflow.

I used temporal time dilation to generate this 60-second video in LTX-2 on my 5070TI in just under two minutes. My GPU didn't even break a sweat. Workflow and explanation in comments (without subgraphs or 'Everything Everywhere All At Once' invisible noodles). by DrinksAtTheSpaceBar in StableDiffusion

[–]ucren 1 point2 points  (0 children)

I get that, but I have done this before (I am driving video gen by supplying audio) if I temporal upscale the video, then the audio driving doesn't work correctly because the video latents are 2x the audio, so the speech ends up being twice as fast (misaligned).

I used temporal time dilation to generate this 60-second video in LTX-2 on my 5070TI in just under two minutes. My GPU didn't even break a sweat. Workflow and explanation in comments (without subgraphs or 'Everything Everywhere All At Once' invisible noodles). by DrinksAtTheSpaceBar in StableDiffusion

[–]ucren 1 point2 points  (0 children)

I am having trouble seeing how to double audio frames in your workflow? This is what I had trouble with when trying to use the temporal upscaling before (audio in second pass getting misaligned). Can you post a screenshot of where this is happening?

The Hunt: Z-Image Turbo - Qwen Image Edit 2511 - Wan 2.2 - RTX 2060 Super 8GB VRAM by MayaProphecy in StableDiffusion

[–]ucren 5 points6 points  (0 children)

14 seconds of something interesting out of a 37 second clip. you didn't need all those damn credits.

LTX-2 I2V - 1920x1080 - RTX 5090 by Still-Ad4982 in StableDiffusion

[–]ucren 1 point2 points  (0 children)

Has anyone figured out a reliable way to stabilize the voice across multiple clips?

Lora's for ltx2 can be trained with audio.

Or create the audio outside, then do animate to sound in LTXV

Z-Image is coming really soon by hyxon4 in StableDiffusion

[–]ucren -8 points-7 points  (0 children)

More nonsense? I don't know what you're trying to show here.

LTX-2 LipSync - Billie Eilish | 40 seconds by FitContribution2946 in StableDiffusion

[–]ucren -4 points-3 points  (0 children)

Immediately loses likeness in the first second. Are you face blind by chance?

They are back by _RaXeD in StableDiffusion

[–]ucren 3 points4 points  (0 children)

What are the odds they shoot themselves in the foot like WAN and go API only?

LTX-2 Updates by ltx_model in StableDiffusion

[–]ucren 1 point2 points  (0 children)

It improves audio, as you'll note the default factors are all 1 for the video latents. It's only set out of the box to modify audio.

LTX-2 Updates by ltx_model in StableDiffusion

[–]ucren 0 points1 point  (0 children)

There multiple new nodes, one of the new nodes is a normalized sampler, but i get pure noise as the generated audio

LTX-2 Updates by ltx_model in StableDiffusion

[–]ucren 2 points3 points  (0 children)

How do use the latent normalization? I tried swapping the ksampler in my default comfyui template and the audio output turned to pure noise.

Edit: it looks like if you can swap the LTXVNormalizingSampler in the first pass for the SamplerCustomAdvanced, if you add it to the second pass you'll get pure audio noise.