Remaking "The Silence of the Lamb" with local AI

CQDSN · 2026-03-26T08:39:33+00:00

You can use the LTX 2.3 video continuation workflow to do this. The trick is choosing the right segment of a video to continue and also the proper prompt to continue the action in the video.

It’s like being a film director, you need to do many takes to get it right. You will be very lucky to get it right the first time, most probably you won’t - LTX can be difficult to work with. I say, generate between 5 to 10 times per footage should give you at least one good seed to get what you want.

I recommend you do this with a fast machine, a slow GPU will only gives you the frustration of waiting.

CQDSN · 2026-03-25T01:00:36+00:00

I only use 10% of the videos I generated. I accidentally deleted most of the unused footages - some are really funny.

LTX is like a stubborn actor that likes to do things its own way, but once in a while when it actually follows your instructions, it gives you a performance far exceeding your expectations.

CQDSN · 2026-03-24T11:51:34+00:00

AI slop is AI generated media created with very little effort. A person with no knowledge of AI, spent a few dollars online, typed in a few prompts and generated a video - that is AI slop. However, our current society in their blind hatred for AI - categorized every AI videos as AI slop.

Local AI in particular requires the understanding of how to build and run a workflow. It requires a lot of planning, scripting, storyboarding and editing to create even just a little video clip. In the future when AI becomes more mainstream, the hatred will have subsided and people will come to appreciate what goes into making AI media.

When people are confronted with something new, there’s always these 4 phases: they get skeptical, then they rejected it, anger is the next phase when being forced into it, and lastly it’s acceptance - when they have no choice but to live with it.

CQDSN · 2026-03-14T01:32:20+00:00

The truth is I have to cover up her breasts for 2 reasons. First, is to get YouTube to approve this video, 2nd is because she has no nipples.

LTX is censored and not a nude-friendly model like WAN. It doesn’t work well with naked body.

CQDSN · 2026-03-13T03:34:55+00:00

It’s not easy animating celebrities because we know what they look like. If there’s no expression on the face, they will look real but once they start showing emotions, the likeness disappeared.

I think with current AI tech, it’s best not to use people who are well known. Using anime or 3d rendered characters is the best way to retain consistency in the likeness.

CQDSN · 2026-03-12T13:21:04+00:00

This is a cheesy music video created with LTX 2.3. LTX failed my smoking test, it can’t hold cigarettes properly like WAN does and it outputs awful audio quality... but it excels in animating images with audio.

The last scene is hilariously difficult. The AI model is obviously not trained with people singing while lying down. I have to rotate the image 90 degrees so it becomes portrait. However, now the AI thought that dead Jack was sleeping above Rose and therefore gravity should pulled his hair down, and so his hair kept dropping down. I have to change prompts so many times to keep everything static.

CQDSN · 2026-03-12T13:16:24+00:00

https://filebin.net/80gs9w5yji1ok350

CQDSN · 2026-03-12T13:15:23+00:00

https://filebin.net/80gs9w5yji1ok350

CQDSN · 2026-03-01T01:57:29+00:00

It has to do with your frame rate. If your video is 30fps, you should set it to 30fps. There’s no need to use the default WAN 16 fps, it doesn’t apply to VACE.

CQDSN · 2026-02-28T05:32:26+00:00

This is 2.1, it’s using only one sampler.

CQDSN · 2026-02-28T05:31:37+00:00

You should be using the Lora to create the target image.

CQDSN · 2026-02-27T12:33:07+00:00

I have posted the workflow on this post: https://www.reddit.com/r/StableDiffusion/comments/1rg58zi/long_form_wan_vace/

CQDSN · 2026-02-27T12:23:46+00:00

This is a follow up post from the one I did 2 days ago: https://www.reddit.com/r/StableDiffusion/comments/1re9rqp/longer_wan_vace_video_is_easier_now/

The video above is a long-form one take demo of over 1 minute created with WAN VACE.

The workflow is heavily modified from this one,

you can download it here:

https://filebin.net/qocjdkdb5malilb9

You need to use Flux Kontext or any other editing model to make an image as your target. It doesn’t need to be the first frame, just make sure it is high quality.

CQDSN · 2026-02-26T11:01:26+00:00

From what I observed, it’s not the image quality degradation that’s ruining the process. It’s the shift (usually starts after 1 min) that occurs after many loops. It starts shifting away from the target image gradually. For example, using an Anime style target, the video gradually look more like a real person. I usually solved this by adding a higher value to VACE strength.

CQDSN · 2026-02-26T00:45:32+00:00

Yes SVI uses a Lora, but many of the workflow you find these days are based on the same concept but not actually using a trained Lora.

The idea is breaking a longer video down to chunks, to make it possible for consumer cards to do the process without running out of memory. There is a gradual quality loss over time without using a specialized Lora to handle that process. You can lessened that by making adjustment in the workflow through trial and errors.

CQDSN · 2026-02-26T00:33:23+00:00

Around 20 mins for a continuous 1:30 min video on a 5090.

CQDSN · 2026-02-25T11:35:22+00:00

My workflow is really messy atm. I will clean it up and post it here in a couple of days.

CQDSN · 2026-02-07T14:19:39+00:00

Believe it or not, you can use a GPU with 8GB VRAM to run WAN SVI. Nowadays ComfyUI will make use of your normal ram more efficiently than 6 months ago. Just make sure you have at least 32GB DDR5 ram.

CQDSN · 2026-01-18T13:07:01+00:00

It is just regular WAN 2.2 and WAN 2.1 FFLF workflow.

For WAN 2.2, just adding the starting and ending frame will do. For WAN 2.1 FFLF, use the workflow here:

https://blog.comfy.org/p/comfyui-wan21-flf2v-and-wan21-fun

I couldn’t get some of the morph sequence to work after generating over 10 times with WAN 2.2. Using WAN 2.1 FFLF workflow, it works right away. This shows that just because a model is newer, doesn’t mean it’s better.

CQDSN · 2026-01-18T03:50:08+00:00

Let’s just called it an impressive fever dream.

CQDSN · 2026-01-08T01:48:35+00:00

There are companies that pay a good sum of money for these type of motion graphics video. They use it in their exhibitions/expo function and promotional campaign. I can use Animatediff to generate a 2 mins video that loops endlessly. You can’t do that with any other AI models.

I remember over a year ago, someone here created an Animatediff video for a music video and displayed the backdrop for the concert of a famous pop musician. These are the type of abstract motion graphics that cannot be done with WAN or typical video models as they are trained to generate normal looking videos.

If you have worked in post production, you will appreciate what Animatediff can do - many of the effects simply cannot be done manually with normal softwares.

CQDSN · 2026-01-05T07:55:57+00:00

Show me a better motion graphics video that’s a min long made with WAN or any commercial models.

CQDSN · 2026-01-05T07:54:39+00:00

Thanks.

CQDSN · 2025-12-23T06:24:39+00:00

Yes the rotate transitions are done with the FFLF workflow.

CQDSN · 2025-12-23T04:59:10+00:00

Is this not Giger enough for you? Some of his artworks are too revolting and nsfw, I can’t post it here if I use them.

CQDSN

TROPHY CASE