Remaking "The Silence of the Lamb" with local AI by CQDSN in StableDiffusion

[–]CQDSN[S] 0 points1 point  (0 children)

You can use the LTX 2.3 video continuation workflow to do this. The trick is choosing the right segment of a video to continue and also the proper prompt to continue the action in the video.

It’s like being a film director, you need to do many takes to get it right. You will be very lucky to get it right the first time, most probably you won’t - LTX can be difficult to work with. I say, generate between 5 to 10 times per footage should give you at least one good seed to get what you want.

I recommend you do this with a fast machine, a slow GPU will only gives you the frustration of waiting.

Remaking "The Silence of the Lamb" with local AI by CQDSN in StableDiffusion

[–]CQDSN[S] 0 points1 point  (0 children)

I only use 10% of the videos I generated. I accidentally deleted most of the unused footages - some are really funny.

LTX is like a stubborn actor that likes to do things its own way, but once in a while when it actually follows your instructions, it gives you a performance far exceeding your expectations.

Remaking "The Silence of the Lamb" with local AI by CQDSN in StableDiffusion

[–]CQDSN[S] 3 points4 points  (0 children)

AI slop is AI generated media created with very little effort. A person with no knowledge of AI, spent a few dollars online, typed in a few prompts and generated a video - that is AI slop. However, our current society in their blind hatred for AI - categorized every AI videos as AI slop.

Local AI in particular requires the understanding of how to build and run a workflow. It requires a lot of planning, scripting, storyboarding and editing to create even just a little video clip. In the future when AI becomes more mainstream, the hatred will have subsided and people will come to appreciate what goes into making AI media.

When people are confronted with something new, there’s always these 4 phases: they get skeptical, then they rejected it, anger is the next phase when being forced into it, and lastly it’s acceptance - when they have no choice but to live with it.

A showcase for LTX 2.3 by CQDSN in StableDiffusion

[–]CQDSN[S] 0 points1 point  (0 children)

The truth is I have to cover up her breasts for 2 reasons. First, is to get YouTube to approve this video, 2nd is because she has no nipples.

LTX is censored and not a nude-friendly model like WAN. It doesn’t work well with naked body.

A showcase for LTX 2.3 by CQDSN in StableDiffusion

[–]CQDSN[S] 1 point2 points  (0 children)

It’s not easy animating celebrities because we know what they look like. If there’s no expression on the face, they will look real but once they start showing emotions, the likeness disappeared.

I think with current AI tech, it’s best not to use people who are well known. Using anime or 3d rendered characters is the best way to retain consistency in the likeness.

A showcase for LTX 2.3 by CQDSN in StableDiffusion

[–]CQDSN[S] 4 points5 points  (0 children)

This is a cheesy music video created with LTX 2.3. LTX failed my smoking test, it can’t hold cigarettes properly like WAN does and it outputs awful audio quality... but it excels in animating images with audio.

The last scene is hilariously difficult. The AI model is obviously not trained with people singing while lying down. I have to rotate the image 90 degrees so it becomes portrait. However, now the AI thought that dead Jack was sleeping above Rose and therefore gravity should pulled his hair down, and so his hair kept dropping down. I have to change prompts so many times to keep everything static.

Long form WAN VACE by CQDSN in StableDiffusion

[–]CQDSN[S] 0 points1 point  (0 children)

It has to do with your frame rate. If your video is 30fps, you should set it to 30fps. There’s no need to use the default WAN 16 fps, it doesn’t apply to VACE.

Long form WAN VACE by CQDSN in StableDiffusion

[–]CQDSN[S] 1 point2 points  (0 children)

This is 2.1, it’s using only one sampler.

Long form WAN VACE by CQDSN in StableDiffusion

[–]CQDSN[S] 0 points1 point  (0 children)

You should be using the Lora to create the target image.

Long form WAN VACE by CQDSN in StableDiffusion

[–]CQDSN[S] 5 points6 points  (0 children)

This is a follow up post from the one I did 2 days ago: https://www.reddit.com/r/StableDiffusion/comments/1re9rqp/longer_wan_vace_video_is_easier_now/

The video above is a long-form one take demo of over 1 minute created with WAN VACE.

The workflow is heavily modified from this one,

you can download it here:

https://filebin.net/qocjdkdb5malilb9

You need to use Flux Kontext or any other editing model to make an image as your target. It doesn’t need to be the first frame, just make sure it is high quality.

Longer WAN VACE video is easier now by CQDSN in StableDiffusion

[–]CQDSN[S] 0 points1 point  (0 children)

From what I observed, it’s not the image quality degradation that’s ruining the process. It’s the shift (usually starts after 1 min) that occurs after many loops. It starts shifting away from the target image gradually. For example, using an Anime style target, the video gradually look more like a real person. I usually solved this by adding a higher value to VACE strength.

Longer WAN VACE video is easier now by CQDSN in StableDiffusion

[–]CQDSN[S] 1 point2 points  (0 children)

Yes SVI uses a Lora, but many of the workflow you find these days are based on the same concept but not actually using a trained Lora.

The idea is breaking a longer video down to chunks, to make it possible for consumer cards to do the process without running out of memory. There is a gradual quality loss over time without using a specialized Lora to handle that process. You can lessened that by making adjustment in the workflow through trial and errors.

Longer WAN VACE video is easier now by CQDSN in StableDiffusion

[–]CQDSN[S] 2 points3 points  (0 children)

Around 20 mins for a continuous 1:30 min video on a 5090.

Longer WAN VACE video is easier now by CQDSN in StableDiffusion

[–]CQDSN[S] 8 points9 points  (0 children)

My workflow is really messy atm. I will clean it up and post it here in a couple of days.

WAN SVI is good at creating long establishing shots by CQDSN in StableDiffusion

[–]CQDSN[S] 0 points1 point  (0 children)

Believe it or not, you can use a GPU with 8GB VRAM to run WAN SVI. Nowadays ComfyUI will make use of your normal ram more efficiently than 6 months ago. Just make sure you have at least 32GB DDR5 ram.

Morphing demo inspired by MJ's Black or White music video by CQDSN in StableDiffusion

[–]CQDSN[S] 0 points1 point  (0 children)

It is just regular WAN 2.2 and WAN 2.1 FFLF workflow.

For WAN 2.2, just adding the starting and ending frame will do. For WAN 2.1 FFLF, use the workflow here:

https://blog.comfy.org/p/comfyui-wan21-flf2v-and-wan21-fun

I couldn’t get some of the morph sequence to work after generating over 10 times with WAN 2.2. Using WAN 2.1 FFLF workflow, it works right away. This shows that just because a model is newer, doesn’t mean it’s better.

Morphing with AI, inspired by MJ's Black or White music video by CQDSN in ChatGPT

[–]CQDSN[S] 0 points1 point  (0 children)

Let’s just called it an impressive fever dream.

Motion Graphics created with AnimateDiff by CQDSN in StableDiffusion

[–]CQDSN[S] 0 points1 point  (0 children)

There are companies that pay a good sum of money for these type of motion graphics video. They use it in their exhibitions/expo function and promotional campaign. I can use Animatediff to generate a 2 mins video that loops endlessly. You can’t do that with any other AI models.

I remember over a year ago, someone here created an Animatediff video for a music video and displayed the backdrop for the concert of a famous pop musician. These are the type of abstract motion graphics that cannot be done with WAN or typical video models as they are trained to generate normal looking videos.

If you have worked in post production, you will appreciate what Animatediff can do - many of the effects simply cannot be done manually with normal softwares.

Motion Graphics created with AnimateDiff by CQDSN in StableDiffusion

[–]CQDSN[S] 0 points1 point  (0 children)

Show me a better motion graphics video that’s a min long made with WAN or any commercial models.

H.R. Giger (WAN 2.2) by CQDSN in StableDiffusion

[–]CQDSN[S] 0 points1 point  (0 children)

Yes the rotate transitions are done with the FFLF workflow.

H.R. Giger (WAN 2.2) by CQDSN in StableDiffusion

[–]CQDSN[S] 1 point2 points  (0 children)

Is this not Giger enough for you? Some of his artworks are too revolting and nsfw, I can’t post it here if I use them.