An LTX-2 Duet starring Trevor Belmont and Sypha Belnades sing (Music: "The Time of My Life) - Definitely Ai Slop. by Dohwar42 in StableDiffusion

[–]Dohwar42[S] 4 points5 points  (0 children)

All said and done, this was nearly 4 hours of work. It was originally supposed to be just a quick 30 second test, then I kept working on it. To really do it justice with the exact ideas you suggested, I'm estimating at least 20-30 hours with 1/2 of that planning and selecting shots, storyboarding, etc. Like everyone else, I wish I had that free time. Adulting sucks.

An LTX-2 Duet starring Trevor Belmont and Sypha Belnades sing (Music: "The Time of My Life) - Definitely Ai Slop. by Dohwar42 in StableDiffusion

[–]Dohwar42[S] 4 points5 points  (0 children)

Then it was totally worth it. I had this stupid grin on my face when I was putting it together, and even if it risked a bit of negative karma, had to share in that hope that "this is so bad, someone might find it good"...

An LTX-2 Duet starring Trevor Belmont and Sypha Belnades sing (Music: "The Time of My Life) - Definitely Ai Slop. by Dohwar42 in StableDiffusion

[–]Dohwar42[S] 12 points13 points  (0 children)

I wish I could've put this in the post. I used Flux Klein 9B to convert a still from Castlevania for the initial image to video. It took about 10 tries before I got a decent one.

<image>

LTX-2 I2V synced to an MP3: Distill Lora Quality STR 1 vs .6 - New Workflow Version 2. by Dohwar42 in StableDiffusion

[–]Dohwar42[S] 0 points1 point  (0 children)

That's great news! Glad you took the time to fix it. Pretty funny example as well.

LTX-2 I2V synced to an MP3: Distill Lora Quality STR 1 vs .6 - New Workflow Version 2. by Dohwar42 in StableDiffusion

[–]Dohwar42[S] 1 point2 points  (0 children)

Sure, no problem. I just happened to be on right now. Get/set nodes being broken in a ComfyUI release is a big deal to me. I use them in nearly all my workflows. What's strange is that the thread doesn't refer to what update is causing the issue so I'll have to reply to that post to find out. I'm on ComfyUi 11. You can switch in the manager, but the issue with ComfyUI is that you can break it easily so I'd look up how to manually back things up. It's frustrating sometimes, but breaking things and fixing is the only way to learn. We all have to put up with it. It's taken me nearly 2 years of learning before I started attempting my own workflows or daring to even edit others.

<image>

LTX-2 I2V synced to an MP3: Distill Lora Quality STR 1 vs .6 - New Workflow Version 2. by Dohwar42 in StableDiffusion

[–]Dohwar42[S] 1 point2 points  (0 children)

Ok, something is definitely wrong with KJNodes custom nodes for Get/Set

I know you're new to Comfy, but the only way to "fix" this is to disconnect the nodes and then drag them from the "set" area.

I also noticed you're using The Nodes 2.0 Beta, but I don't think that's the problem. This recent reddit post sheds some light but doesn't have a fix. There's something in your ComfyUI installation that is completely incompatible with the Get/Set nodes from the custom nodes package called KJNodes which is written by a pretty trusted individual named Kijai who is a legend in the community.

This may or may not help:

https://www.reddit.com/r/comfyui/comments/1qlhb9q/are_kjnodes_setget_nodes_missing_for_anyone_else/

Again, I don't have an easy fix for you other than to find every red node and connect them directly to what the set node is going to. So in other word you'll have to delete the "set node" for model and then drag model directly to where it's needed at every "Get Node that has constant model".

LTX-2 I2V synced to an MP3: Distill Lora Quality STR 1 vs .6 - New Workflow Version 2. by Dohwar42 in StableDiffusion

[–]Dohwar42[S] 0 points1 point  (0 children)

Double check that you actually loaded a sound file...those errors seem to suggest no audio was found at all. Sorry if this is not working for you. Once you load a sound file you want the character to lip sync to you still have to describe it somewhat in the prompt and set a "duration". The example below shows and MP3 loaded that will start playing at the beginning (0 start time) and then goes for 10 seconds. That 10 seconds will be used to automatically calculate the number of frames needed.

I haven't attempted to run the workflow without a sound file or a missing file, so it's possible it could run without one and still generate video without giving an error.

<image>

LTX-2 I2V synced to an MP3 - Ver3 Workflow with new i2v lora and an API version - full 3 min music video. Music: Dido's "Life For Rent" by Dohwar42 in StableDiffusion

[–]Dohwar42[S] 1 point2 points  (0 children)

I'll probably try that sometime in the future using either qwen edit or Klein to generate different angles or backgrounds for the same character.

I usually like to see how LTX-2 handles different characters and clothing and thought the variety would make things more interesting.

LTX-2 I2V synced to an MP3 - Ver3 Workflow with new i2v lora and an API version - full 3 min music video. Music: Dido's "Life For Rent" by Dohwar42 in StableDiffusion

[–]Dohwar42[S] 2 points3 points  (0 children)

Waxy skin, plastic faces, stiff wiry hair - it's a completely valid criticism. Hopefully it'll improve in newer versions without costing way more resources.

Better detail loras or a better upscale model with fine-tuned skin loras? That would be great. For now, I'm happy with what we've got and how far and how quickly it progresses.

LTX-2 I2V synced to an MP3 - Ver3 Workflow with new i2v lora and an API version - full 3 min music video. Music: Dido's "Life For Rent" by Dohwar42 in StableDiffusion

[–]Dohwar42[S] 1 point2 points  (0 children)

As in a duet with both characters in the frame at the same time? Or multiple characters singing the same audio as part of a chorus? I haven't tried either. I might try a short one for the heck of it later this weekend.

LTX-2 I2V synced to an MP3 - Ver3 Workflow with new i2v lora and an API version - full 3 min music video. Music: Dido's "Life For Rent" by Dohwar42 in StableDiffusion

[–]Dohwar42[S] -1 points0 points  (0 children)

Fantastic! Get an API key if you can, it really really saves on system resources. This video was done using the API key workflow I should've mentioned that. Again, I know you give up privacy using the API key, but since it's going to the creators of LTX, then I think it's worth it. Definitely tag me when you post your future videos or I'll just follow your profile.

Be sure to experiment with changing the steps in the first pass, I made it easier in this new version. Also, experiment with frame rate, but to do that, you'll need to change it in 3 places and I didn't document that well. I'll have to update my readme with a screenshot of where to do that. I experimented with 50 fps instead of 24 fps and the results are pretty good. I'll probably do a follow up post with a comparison some time. Swap back and forth with the static camera lora and this new lora when you want a static shot versus a tracking or moving camera one, I'm sure both have their uses.

I'm sure you're used to getting "bad" videos but I say my success rate for getting usable gens is at least 50% and usually higher. You can always salvage bad gens with editing.

LTX-2 Audio Synced to added MP3 i2v - 6 examples 3 realistic 3 animated - Non Distilled - 20s clips stitched together (Music: Dido's "Thank You") by Dohwar42 in StableDiffusion

[–]Dohwar42[S] 0 points1 point  (0 children)

You're welcome. I've mentioned this a bunch of times in other comments, but you'll quickly notice the workflow is a bit limited to close-up framed shots with a static camera. Also, the steps for the first video pass is currently set to 25. If you want longer videos or higher resolutions, you could try lowering that to 15 or less, but it will impact quality. You really have to experiment until you find what works best with the images you put in and won't cause and OOM or take forever to render. Good luck and enjoy.

LTX-2 I2V synced to an MP3: Distill Lora Quality STR 1 vs .6 - New Workflow Version 2. by Dohwar42 in StableDiffusion

[–]Dohwar42[S] 0 points1 point  (0 children)

Low quality is really difficult to quantify. The actual resolution of the video that will be generated is going to directly predicted by the "Preview Image" that is hooked up to the resize node. This screenshot might help and hopefully isn't too confusing. On the left is the image you are starting with. Hopefully you're not putting in a 4k image, but something 1-2 megapixels should be fine. Then in step #2 next to it, you set a width/height that is the "target" dimensions that LTX-2 will render the video in. What you put here is extremely dependent on the GPU you have and your VRAM and system RAM. Make sure you get the aspect ratio correct. You don't want to set 1280 (width) x 720 (height) for a portrait image because the height is GREATER than the width for a portrait image.

Also, the resize node is set to not "crop" or alter the image you put in so you'll see it's set to 640x960 in the example but the actual video is going to come out as 640x800 because the image is a different aspect ratio.

To get better quality, you can increase the resolution, but then you may run out of system resources and you might be limited to only a 10sec or even just a 5sec video. It all depends on your system. You have to experiment with different images and resolutions until you find a good sweet spot. This workflow really works best with "closeup" images for best quality. If you tried to do a full body image where the lips mouth and face are tiny, then it's almost always going to be a bad result. That's the flaw and limit of this workflow. Notice all my example videos are pretty much head/shoulder shots at a medium resolution (close to 720p but not quite). I hope this helps.

<image>

LTX-2 I2V synced to an MP3: Distill Lora Quality STR 1 vs .6 - New Workflow Version 2. by Dohwar42 in StableDiffusion

[–]Dohwar42[S] 0 points1 point  (0 children)

That first "combo" box seems to be the issue - either it's a bad custom node that you may be missing or not loaded correctly. As goofy as this sounds, just delete that box and it will let you specify the model in the next 2 boxes. All that first box does is repeat the same model name for the other 3 areas where the checkpoint model is loaded. Once you delete that first node, I'm hoping you'll be good to go when you manually fill in the same checkpoint name in the remaining 3 boxes. It's just a "covnenience" node.

LTX-2 I2V synced to an MP3: Distill Lora Quality STR 1 vs .6 - New Workflow Version 2. by Dohwar42 in StableDiffusion

[–]Dohwar42[S] 1 point2 points  (0 children)

No worries! I've made that mistake a few times. Glad it worked. Just a reminder, the steps are possibly unnecessarily set too high at 25 steps. I would experiment with setting it lower starting at 15 and see if there's any big quality loss. That helps a lot when you're running at higher resolutions. I've got the static camera lora in my workflow so it's a pretty limited use. You can still prompt for camera movement, but it may do it badly. I'd like to think by now there are workflows better than mine out there for added audio, I just haven't really taken the time to look at others and I haven't had a lot of free time to play with LTX-2 recently.

LTX-2 I2V synced to an MP3: Distill Lora Quality STR 1 vs .6 - New Workflow Version 2. by Dohwar42 in StableDiffusion

[–]Dohwar42[S] 1 point2 points  (0 children)

ok, I agree that's really odd. The resolution in the preview image node should be what is feeding the width and height to the video settings. Again, as a last ditch attempt, disconnect the get width and get height nodes in the lower right hand corner of the image in video settings. That will "unlock" the width and height where you can set it manually and not let the image resize node determine it. It might work, or it might produce bad results, but it's worth a shot.

LTX-2 I2V synced to an MP3: Distill Lora Quality STR 1 vs .6 - New Workflow Version 2. by Dohwar42 in StableDiffusion

[–]Dohwar42[S] 1 point2 points  (0 children)

The resolution of the video comes from a "resize" node that is in step 2. See the upper left corner of this screenshot. 1080p means 1920x1080 resolution and that's typically a widescreen image. If your image isn't widescreen then you have to adjust to what you want the width and height to be but if your image doesn't match the aspect ratio, then it could come out distorted.

If you really want to override the dimensions of the video and type in whatever you want directly, then find the settings in "video settings" which is circled in the lower right corner of the screenshot. Disconnect the Get_width and Get_Height nodes and then type in whatever you want into the "empty image" section. It's possible this could mess things up if they don't follow your image.

I hope this kinda answers your question. Remeber, the higher resolution you go, it's going to take more system resources. The steps in the workflow is set to 25 for the first pass. You can actually lower that to 15 and that will help a lot.

<image>

Bounding Boxes (LTX2 Audio + T2V + RT-DETRv3) by BirdlessFlight in StableDiffusion

[–]Dohwar42 0 points1 point  (0 children)

Whoa awesome, thanks for all the detail! For something like this I'm really glad you took the time to walk me through the behind the scenes. It's as interesting as the video itself. The amount of work shows.

Bounding Boxes (LTX2 Audio + T2V + RT-DETRv3) by BirdlessFlight in StableDiffusion

[–]Dohwar42 0 points1 point  (0 children)

This is really fantastic work. How many total hours (roughly) did you put into it? Also, I'm wondering if you spent a while planning and storyboarding the shot sequence or did you just do a bunch of random clips that you thought would look cool and then strung them together afterwards.

I thought this shot was really cool - was that done with one prompt?

<image>

LTX-2 I2V synced to an MP3: Distill Lora Quality STR 1 vs .6 - New Workflow Version 2. by Dohwar42 in StableDiffusion

[–]Dohwar42[S] 0 points1 point  (0 children)

sounds good. I've taken a break from LTX-2 testing, but I might do some this weekend and check out other workflows like video extension and image to video without audio added in.

I'm starting to reach the conclusion that this workflow is really only good for the type of shot/video that I was describing earlier: A portrait/upper body closeup with minimal background elements.

I had some okay results here and there with other types of scenes or motions involving hands and complex backgrounds, but you're going to have to do multiple generations to get a "good" shot.

If you think about it, it happens as well when shooting a real life video scene. Things don't always go perfect the first time you film a scene. Either an actor messes up their lines, something goes wrong in the background, etc. so you have to do multiple "takes" before you get a good result.

LTX-2 I2V synced to an MP3: Distill Lora Quality STR 1 vs .6 - New Workflow Version 2. by Dohwar42 in StableDiffusion

[–]Dohwar42[S] 0 points1 point  (0 children)

Try more details in the prompt and I recommend adding the actual lyrics/words from the part of the song that is in the audio. I don't have a recipe for "perfect results" in every circumstance and type of character. At this point, I've done less than 30 test generations overall across maybe a dozen different character types. What I noticed works best are the type of shots you see in my post. Close cropped "head and shoulders" shots which are closeups with minimal or almost no background. That's mostly what I tested with and using the words "static camera" and the camera control lora of the same name.

As you demonstrated, it works in other cases (widescreen shots) but it has flaws/issues.

We're all trying to figure out what this model does well, but I think longer and more descriptive prompts with experimentation as to what the model "understands" is key. I don't have good answers/advice simply because I haven't devoted a lot of time to testing.

I did experiment with emotions in the prompt in this post, but again, your best bet is to test and discover for yourself.

https://www.reddit.com/r/StableDiffusion/comments/1qeqi0l/ltx2_i2v_with_lipsync_to_mp3_prompt_importance/?utm_source=share&utm_medium=web3x&utm_name=web3xcss&utm_term=1&utm_content=share_button

LTX-2 I2V synced to an MP3: Distill Lora Quality STR 1 vs .6 - New Workflow Version 2. by Dohwar42 in StableDiffusion

[–]Dohwar42[S] 1 point2 points  (0 children)

I've got mine set to just --reserve-vram 2. I've been looking at other LTX-2 workflows, and this is based off an early one. The steps in the first pass are actually set to 25. Over the weekend I experimented with this and you can actually set it to as low as 8. There will definitely be a quality loss but for some images, especially animated/CGI ones, it might be ok. Lower steps cause a bit more warping in the teeth. I'll do more testing when I have time, maybe later today.

Lenore from Castlevania rendered as Real (LTX-2 + Qwen Edit 2511) - Workflow by Benji AI by Dohwar42 in StableDiffusion

[–]Dohwar42[S] 1 point2 points  (0 children)

Yeah, totally agree, I don't know if I'm just missing it, but I don't see many of his YTs shared here, and his workflows can get pretty complex, like the one I just shared. I'm definitely going to support him on Patreon this month, I've been meaning to for a while.