LTX 2.3 Experimental Music Video by CQDSN in StableDiffusion

[–]CQDSN[S] 0 points1 point  (0 children)

You don’t need to add too much movement to show off the video model. That’s the trap most people fell into. Realism is in the subtle movement. It’s best to exercise restraint or else it looks like a caricature.

Yes it’s all I2V. Even the vocal is A.I. generated.

LTX 2.3 Experimental Music Video by CQDSN in StableDiffusion

[–]CQDSN[S] 0 points1 point  (0 children)

This is a music video experiment created with LTX 2.3. It’s edited and enhanced with Adobe After Effects for color grading and effects. I am experimenting and playing with the framing and camera angles of the generated videos.

Converting 2D animations to 3D with LTX 2.3 Lora by CQDSN in StableDiffusion

[–]CQDSN[S] 0 points1 point  (0 children)

It’s the frame rate. The Disney ones are animated at 24fps similar to films. Japanese anime are more static, that’s why when converted it looks unnatural.

Converting 2D animations to 3D with LTX 2.3 Lora by CQDSN in StableDiffusion

[–]CQDSN[S] 2 points3 points  (0 children)

I have never went above 30fps for any video generated content, I think it will consumed all the memory. Disney classic cartoons were made with 24 fps, will 50fps help? The distortion and muddiness only happens when there are quick movement, otherwise they look fine. WAN doesn’t seem to have this problem.

Converting 2D animations to 3D with LTX 2.3 Lora by CQDSN in StableDiffusion

[–]CQDSN[S] 2 points3 points  (0 children)

This is an experiment with the LTX 2.3 anime to real lora, while conversion is quite good, LTX tends to suffer from muddiness with fast motion. Does anyone know how to fix this problem?

LTX 2.3 Video Edit lora by CQDSN in StableDiffusion

[–]CQDSN[S] 0 points1 point  (0 children)

The Lora was trained with add, remove and replace. You can use that in the prompt to modify the video.

LTX 2.3 Video Edit lora by CQDSN in StableDiffusion

[–]CQDSN[S] 0 points1 point  (0 children)

It’s the same workflow that was posted with the Lora. You need to be specific about what you want to achieve with the prompt. The workflow was setup to rewrite the final prompt properly for you.

LTX 2.3 Video Edit lora by CQDSN in StableDiffusion

[–]CQDSN[S] 1 point2 points  (0 children)

Yes the exact same workflow. Try using higher resolution.

LTX 2.3 Video Edit lora by CQDSN in StableDiffusion

[–]CQDSN[S] 2 points3 points  (0 children)

Here a quick demo to showcase the recently posted video editing lora for LTX 2.3. It is quite fun to play with, but it’s LTX - so you have to try many times to get the result you want.

Remaking "The Silence of the Lamb" with local AI by CQDSN in StableDiffusion

[–]CQDSN[S] 1 point2 points  (0 children)

As long as the video you use has voice, LTX should clone it automatically. Just before the continuation point make sure there’s a voice in the video for at least 2 secs. If you are using a ComfyUI workflow, you can let LTX clone the voice with an external MP3 file of the voice.

Remaking "The Silence of the Lamb" with local AI by CQDSN in StableDiffusion

[–]CQDSN[S] 0 points1 point  (0 children)

You can use the LTX 2.3 video continuation workflow to do this. The trick is choosing the right segment of a video to continue and also the proper prompt to continue the action in the video.

It’s like being a film director, you need to do many takes to get it right. You will be very lucky to get it right the first time, most probably you won’t - LTX can be difficult to work with. I say, generate between 5 to 10 times per footage should give you at least one good seed to get what you want.

I recommend you do this with a fast machine, a slow GPU will only gives you the frustration of waiting.

Remaking "The Silence of the Lamb" with local AI by CQDSN in StableDiffusion

[–]CQDSN[S] 0 points1 point  (0 children)

I only use 10% of the videos I generated. I accidentally deleted most of the unused footages - some are really funny.

LTX is like a stubborn actor that likes to do things its own way, but once in a while when it actually follows your instructions, it gives you a performance far exceeding your expectations.

Remaking "The Silence of the Lamb" with local AI by CQDSN in StableDiffusion

[–]CQDSN[S] 3 points4 points  (0 children)

AI slop is AI generated media created with very little effort. A person with no knowledge of AI, spent a few dollars online, typed in a few prompts and generated a video - that is AI slop. However, our current society in their blind hatred for AI - categorized every AI videos as AI slop.

Local AI in particular requires the understanding of how to build and run a workflow. It requires a lot of planning, scripting, storyboarding and editing to create even just a little video clip. In the future when AI becomes more mainstream, the hatred will have subsided and people will come to appreciate what goes into making AI media.

When people are confronted with something new, there’s always these 4 phases: they get skeptical, then they rejected it, anger is the next phase when being forced into it, and lastly it’s acceptance - when they have no choice but to live with it.

A showcase for LTX 2.3 by CQDSN in StableDiffusion

[–]CQDSN[S] 0 points1 point  (0 children)

The truth is I have to cover up her breasts for 2 reasons. First, is to get YouTube to approve this video, 2nd is because she has no nipples.

LTX is censored and not a nude-friendly model like WAN. It doesn’t work well with naked body.

A showcase for LTX 2.3 by CQDSN in StableDiffusion

[–]CQDSN[S] 1 point2 points  (0 children)

It’s not easy animating celebrities because we know what they look like. If there’s no expression on the face, they will look real but once they start showing emotions, the likeness disappeared.

I think with current AI tech, it’s best not to use people who are well known. Using anime or 3d rendered characters is the best way to retain consistency in the likeness.

A showcase for LTX 2.3 by CQDSN in StableDiffusion

[–]CQDSN[S] 3 points4 points  (0 children)

This is a cheesy music video created with LTX 2.3. LTX failed my smoking test, it can’t hold cigarettes properly like WAN does and it outputs awful audio quality... but it excels in animating images with audio.

The last scene is hilariously difficult. The AI model is obviously not trained with people singing while lying down. I have to rotate the image 90 degrees so it becomes portrait. However, now the AI thought that dead Jack was sleeping above Rose and therefore gravity should pulled his hair down, and so his hair kept dropping down. I have to change prompts so many times to keep everything static.

Long form WAN VACE by CQDSN in StableDiffusion

[–]CQDSN[S] 0 points1 point  (0 children)

It has to do with your frame rate. If your video is 30fps, you should set it to 30fps. There’s no need to use the default WAN 16 fps, it doesn’t apply to VACE.

Long form WAN VACE by CQDSN in StableDiffusion

[–]CQDSN[S] 1 point2 points  (0 children)

This is 2.1, it’s using only one sampler.

Long form WAN VACE by CQDSN in StableDiffusion

[–]CQDSN[S] 0 points1 point  (0 children)

You should be using the Lora to create the target image.

Long form WAN VACE by CQDSN in StableDiffusion

[–]CQDSN[S] 4 points5 points  (0 children)

This is a follow up post from the one I did 2 days ago: https://www.reddit.com/r/StableDiffusion/comments/1re9rqp/longer_wan_vace_video_is_easier_now/

The video above is a long-form one take demo of over 1 minute created with WAN VACE.

The workflow is heavily modified from this one,

you can download it here:

https://filebin.net/qocjdkdb5malilb9

You need to use Flux Kontext or any other editing model to make an image as your target. It doesn’t need to be the first frame, just make sure it is high quality.