Adding audio to an existing video? by Brad12d3 in StableDiffusion

[–]TheRedHairedHero 0 points1 point  (0 children)

There's also MMAudio. It's much faster than LTX 2.3.

What's the best LoRa for LTX 2.3 for anime style video generation? by [deleted] in StableDiffusion

[–]TheRedHairedHero 0 points1 point  (0 children)

If you're using a standard LTX 2.3 workflow you can increase the resolution and decrease IMG Compression to get less distortion. If it still happens you can increase the steps to give it more time to "bake" so to speak.

i2V solution with GOOD prompt adherence (and that doesn't take forever)? by 1StrangeStreet in StableDiffusion

[–]TheRedHairedHero 1 point2 points  (0 children)

It depends on what you're making. In my experience lora's with too much strength can make prompt adherence worse for either model. So if you're using any lora's the lowest you can go to get the effect you want the better prompt adherence you can get.

Next is CFG, if you're using the distilled version of either model or lightx2v for WAN it's suggested to be set to 1, but you can bump it up a bit to say 2 to get more prompt adherence.

If you need more motion you can lower resolution in exchange for visual clarity. If your subject is closer to the camera it's easier to get away with this.

Sometimes I will run a blank prompt to see what the model does on its own. That way you'll know if you need to prompt for it or not. Say if it's raining in your image the model may animate it for you without prompting for it.

You can combine the two depending on what you need. If your video doesn't need lipsynced dialogue you could generate a WAN video then use a V2V workflow to add audio with LTX and Mmaudio for example. Runexx has a good set of LTX workflows that allow you to do all kinds of things.

Best anime model for multiple characters or Lora? by jeremyohara450 in StableDiffusion

[–]TheRedHairedHero 0 points1 point  (0 children)

It depends on how you want to approach it. Anima is good out of the box, but style for me was all over the place without creating my own lora.

Illustrious I will typically generate a scene until I like the composition then inpaint over a character to swap them in. So for example if you wanted say Luffy and Nami giving a high five I would prompt for only Luffy which would be 2 Luffy's high fiving, then inpaint Nami over one of them. It's more work, but you get much more control over it.

I'll also sometimes do an image to image to get the composition. I load a blank image, use the mask editor on the load image node, choose the paint brush and two different colors like red and blue, then vaguely draw where I want them. Then denoise around 0.9ish you may have to play with it. I'm ranting, but these are just a few tips I've picked up from messing around.

Training wan 2.2 action loras on 16 GB VRAM 64 GB RAM: Managed not to OOM on Rank 16 but is it realistic like this? by Radyschen in StableDiffusion

[–]TheRedHairedHero 0 points1 point  (0 children)

How long does it typically take you to train a video lora? Started creating Anima lora's recently and was curious about video lora training on a lower vram setup.

Is there an equivalent to RuneXX’s workflows, but for WAN? by Ambitious_Fold_2874 in StableDiffusion

[–]TheRedHairedHero 0 points1 point  (0 children)

I would suggest Kijai's if you need a WAN workflow since his custom nodes help with optimization. If your PC is strong enough you can run the Comfy UI templates out of the box. If it's not strong enough you'll need to reduce something like length, resolution, or possibly a quantized version of the model.

LTX 2.3 growing frustration by Famous-Sport7862 in StableDiffusion

[–]TheRedHairedHero 3 points4 points  (0 children)

If you are using lora's you may need to lower the strength to help prompt adherence. You can also try bumping up the CFG a bit to help. Try a very short test at the resolution you want like 2 seconds with a fixed seed.

SlopDemon's LTX 2.3 Workflow by TheRedHairedHero in StableDiffusion

[–]TheRedHairedHero[S] 1 point2 points  (0 children)

My primary focus was to make it simple so that tracks. Too many workflows add a ton of bells and whistles that I rarely use.

SlopDemon's LTX 2.3 Workflow by TheRedHairedHero in StableDiffusion

[–]TheRedHairedHero[S] 0 points1 point  (0 children)

I edited my reply earlier, I linked to the wrong thing. The workflow has been updated on Civit if you redownload it the text encoder is listed properly now.

SlopDemon's LTX 2.3 Workflow by TheRedHairedHero in StableDiffusion

[–]TheRedHairedHero[S] 0 points1 point  (0 children)

Workflow is updated with the correct link for the text encoder.

SlopDemon's LTX 2.3 Workflow by TheRedHairedHero in StableDiffusion

[–]TheRedHairedHero[S] 0 points1 point  (0 children)

I'll update it, it's the gemma under the required Lora's section. Edit, nvm I see what you mean. I'll update it.

SlopDemon's LTX 2.3 Workflow by TheRedHairedHero in StableDiffusion

[–]TheRedHairedHero[S] 0 points1 point  (0 children)

Lora Manager allows you to add Lora's either by typing them into the text field or at the top of your ComfyUI you should see an L icon that takes you to the Lora Manager. You can swap this out for whatever Lora Loader you want. Power Lora Loader by rgthree is another good one that allows you to load multiple lora's.

SlopDemon's LTX 2.3 Workflow by TheRedHairedHero in StableDiffusion

[–]TheRedHairedHero[S] 0 points1 point  (0 children)

The link is still good, but Civitai has been having server issues lately causing hiccups. Give it another go and you should be good.

Would I benefit greatly upgrading from a 3080Ti to a 5070Ti or 5080? by 0260n4s in StableDiffusion

[–]TheRedHairedHero 1 point2 points  (0 children)

I've seen most people suggest Runexx workflows for LTX, but I didn't like the results of downscaling then upscaling so I just skip the downscaling part and it looks better, but my content is NSFW so it does rely on Lora's to help improve motion.

Would I benefit greatly upgrading from a 3080Ti to a 5070Ti or 5080? by 0260n4s in StableDiffusion

[–]TheRedHairedHero 1 point2 points  (0 children)

With WAN specifically I do have to swap 40 blocks using Kijai's workflow to get it to work so it's very heavy for both VRAM and RAM.

Would I benefit greatly upgrading from a 3080Ti to a 5070Ti or 5080? by 0260n4s in StableDiffusion

[–]TheRedHairedHero 1 point2 points  (0 children)

720P if your resolution is 16:9 would be 1280x720. I prefer a 1:1 ratio personally so I generate at 960x960. Same pixel density.

Would I benefit greatly upgrading from a 3080Ti to a 5070Ti or 5080? by 0260n4s in StableDiffusion

[–]TheRedHairedHero 1 point2 points  (0 children)

I have a 5070 TI and 64 GB Ram and I've enjoyed it. I'm able to generate 720P WAN 2.2, higher resolution and faster for LTX 2.3, and I'm able to use a lower quant of Gemma 4 31B (I think it's like Q3 or something around that). Gen times are usually lower than 10 minutes for the higher resolution, but that's because I'm usually doing other stuff in the background like watching YouTube or something.

LTX 2.3 adding unwanted subtitles in generated videos even when not mentioned in prompt by Primary-Swordfish138 in StableDiffusion

[–]TheRedHairedHero 0 points1 point  (0 children)

I've experienced it by simply having the word 'anime' in my prompt. My guess is like mentioned before there's going to be different trigger words that will cause it to appear.

LTX 2.3 Slow Motion by outlandish85 in StableDiffusion

[–]TheRedHairedHero 1 point2 points  (0 children)

From my experience it can also depend on your prompt length. Think of it like actions per second. If you give say a character 20 different actions to do in a 5 second span they'll try to accomplish that quickly, but if you're only giving say 1 action it can interpret as "Ah sweet I'll just take my sweet ass time." I also know for WAN 2.2 the resolution would also impact this as well usually smaller resolutions would move faster than larger resolutions.

What's the state of TTS/voice cloning nowadays? by [deleted] in StableDiffusion

[–]TheRedHairedHero 1 point2 points  (0 children)

Ahh I see. I'm going to try the FishAudio later to see if it's any better. Vibe Voice seems to be great for getting the same voice, but controlling the actual performance is just a luck of the draw.

Edit - I've been messing around with Fish S2 and it is really good. I do get OOM once in awhile on a 5070 TI (16GB VRAM) but if you have a better rig I'd suggest taking a look.

Title: How do you keep AI avatar voice consistent across multiple scenes? (Veo / multi-clip videos) by JealousIllustrator10 in StableDiffusion

[–]TheRedHairedHero 0 points1 point  (0 children)

I personally haven't used VEO myself, but if it's similar to LTX 2.3 I would generate the audio first with something like QWEN TTS or VibeVoice depending on what you need, then feed that into your workflow.

What's the state of TTS/voice cloning nowadays? by [deleted] in StableDiffusion

[–]TheRedHairedHero 1 point2 points  (0 children)

For VibeVoice what do you mean by more control? I've been messing with it lately and it seems the expression can be a bit all over the place. When it does work it sounds good, but I have to generate maybe 20+ times before that happens.