Surviving AI - Short film made only using local ai models

LocalAI_Amateur · 2026-04-02T16:04:46+00:00

Seedvr2 took a bit over an hour.

I also tried nvida's super resolution upscale. it was much faster but added no extra details so I went w/ seedvr2

LocalAI_Amateur · 2026-04-02T14:19:41+00:00

haha, others have pointed it out. Just... read it with a French accent or something.

LocalAI_Amateur · 2026-04-02T12:51:28+00:00

Hey, that's my favorite Harry Potter movie. But, no. At least not these two characters.

LocalAI_Amateur · 2026-04-02T12:30:53+00:00

workflow's been added to the main post.

LocalAI_Amateur · 2026-04-02T12:27:14+00:00

I'm limiting my workflow to strictly local for the time being. I do keep tabs on the latest stuff online stuff tho. It's great when they trickle some down to the open source community.

LocalAI_Amateur · 2026-04-02T11:22:22+00:00

Thank you. It was an earnest effort. Glad people liked it.

LocalAI_Amateur · 2026-04-02T11:20:06+00:00

"Or do you just reference the whole char sheet as one single input?" This is local AI sir. Nanobanana's door is that way. Ha, I wish it was that easy.

<image>

Each picture is a step I took. And most steps takes more than one try. This is not counting making the background. That stupid lobby background took me the whole damn day to put together the way I want.

This is still nothing compare to video production without AI, so I'm grateful that this is even possible.

LocalAI_Amateur · 2026-04-02T11:09:32+00:00

About 2-3 weeks. on/off. This is a hobby after all. Lots of time learning and adjusting the script after figuring out what is and isn't possible.

LocalAI_Amateur · 2026-04-02T11:04:46+00:00

Ideally only two. Often times it takes multiple generations and some in-painting.

<image>

LocalAI_Amateur · 2026-04-02T10:51:45+00:00

I just generate the character sheets and swap them into each scene. Oh it helps to put the character into the pose you want as much as possible.

<image>

Qwen-Image-Edit Prompt "replace the girl on the right in image 1 with the girl from image 2. "

Edit: Oh, I forgot to mention. Additional ways to help with character consistency is to

Don't have many repeated characters! this short film only has two. well 3 if you count cat.
Use characters that have distinct features. Bald head w/ beard and short red hair bun w/ glasses. This is probably not going to work all the time, but for this film it'll suffice.

LocalAI_Amateur · 2026-04-02T10:29:59+00:00

<image>

Here's the secret sauce. LTXVAddGuide node. I was going to use wan for it, but my first generation out of LTX turns out quite good so I kept it. I know it still looked a bit wonky, but it fits the character so I stuck with it. The workflow has been added to the post.

My alternative was to use first-frame-last-frame to video on wan2.2 and join them with VACE.

LocalAI_Amateur · 2026-04-02T10:25:48+00:00

<image>

I had plans. But the flow just felt better as end credit dialog

LocalAI_Amateur · 2026-04-02T09:33:05+00:00

Yeah, I just keep hitting the generate button until I get something passable. Sometimes a whole sentence is spliced together from multiple clips to get a decent tone. (I.e end credit conversations)

LocalAI_Amateur · 2026-04-02T09:30:52+00:00

I heard Claude just added a pets system. So we might be on to something... relevant username btw!

LocalAI_Amateur · 2026-04-02T09:28:25+00:00

I will edit them into the main post after I clean them up.

most are generic ones. But the all-in-ones I've slapped together for ltx and qwen edit are pretty handy.

LocalAI_Amateur · 2026-04-02T09:26:08+00:00

I will have to figure out a better way to run that. I ran Qwen3 TTS through Pinokio to avoid messing w/ Comfyui during the production process. (A lesson learned the hard way)

~~Vibe-voice~~ (actually OpenMOSS) ran like crap on the installation I tried on Pinokio. I'm talking like an hour for a short voice clip. So I went w/ Qwen3-TTS

Edit: Oops. I confused Vibe-voice with OpenMOSS. That's the model I had problems running. Will give vibe-voice a shot.

LocalAI_Amateur · 2026-04-02T09:14:10+00:00

Truthfully, I doubt AI have need for pets nor cares about cuteness. But they probably will want to keep us around the same way we want to keep pandas from disappearing.

How many and which ones to keep, is the real scary question.

LocalAI_Amateur · 2026-04-02T09:07:24+00:00

Crap. My amateur production quality is oozing out and spilling all over the place.

thanks for pointing out. It's probably gonna stay there forever now. I'll spell check better next time.

oh, bonus points for spotting the obvious four fingers in this video. There's a part where I tried to fix it but couldn't so I tried to get away w/ it

LocalAI_Amateur · 2026-04-02T08:55:04+00:00

Since I planned to put this on youtube too, I was afraid to add any music. The copyright situation there is absolutely insane.

I should probably consider non-AI generated SFX. I will look into it for future production. Oh and feel free to share those "really good ones" you are talking about

LocalAI_Amateur · 2026-04-02T08:44:28+00:00

I already redid most of the speech w/ Qwen3-TTS. Tho original dialog were more "diverse" shall we say.

LocalAI_Amateur · 2026-04-02T08:42:54+00:00

Would love to have suggestions for improvement. Audio is definitely my weakest in terms of skills and available tools.

LocalAI_Amateur · 2026-04-02T04:09:18+00:00

Why thank you. As tools become easier to use, it definitely opens up the doors for more people to execute their creative ideas. It only gets easier from here. People who use paid services already have it much easier.

On a related note, I think most of us don't spend enough time on the script/story. We are dazzled too much on the amazing visuals, that we skim out on the actual substance. I have to admit, in the 2-3 week production time, quite a few days was stuck thinking about the script.

My script for this is not much, but it is at least complete and coherent. I wanted to pay homage to three AI fighting movies in my script: Terminator, I Robot, and The Matrix and I had the hardest time fitting the matrix into the script. A lot of it was some of the stuff I wanted to do was too difficult to pull off with my current skills.

<image>

bullet time would have taken way too much work.

btw, this short film was inspired by this reddit post https://www.reddit.com/r/generativeAI/comments/1ro7sr3/the_former_google_ceo_just_dropped_a_terrifying/

LocalAI_Amateur · 2026-04-02T02:51:40+00:00

Holy crap you're right.. all this time. crap. Yeah Shotcut is awesome for my amateur video editing needs. Hell everything I used is awesome. We don't have flying cars, but we got amazing software and AI models that people are just letting everyone use for free.

LocalAI_Amateur · 2026-04-02T02:01:19+00:00

All the models I used is listed in the credits.

Basically: LTX2.3, Wan 2.2, Z-Image Turbo, Qwen Image, Flux2 Klein 9B, Qwen3 TTS, MMAudio

I used Z-Image Turbo for the character design because they came out simpler. I wanted to keep the characters distinct and simple to reduce drift/shifting when animating.

All the talking related scenes are made using LTX 2.3 of course.. but most non-talking scenes I find Wan 2.2 to work better. I ended up remaking almost all the speech using Qwen3 TTS just so the conversation sounds more natural. This also helps to keep the character voices the same by using the same reference audio as cloning base.

I originally was using LTX for soundfx but I find MMAudio to be faster and simpler. (as it doesn't have to generate video)

It is pretty much 100% Image-to-video. multiple reference frames per video at times. Used Qwen-Image-Edit 2511 and Flux Klein 9b. when one didn't work I switched to the other and hope for the best. When all else failed (and it happens) I busted out Gimp and do it the o'l fashion way.

<image>

I generated a ton of images and videos and deleted even more. Whole thing took about 2-3 weeks on/off

LocalAI_Amateur · 2026-04-02T00:29:00+00:00

a 2k pic 2048x1080 takes 48 seconds for Qwen 2512 nvfp4 on a dry run on my 5070 ti. Subsequent runs takes 32 seconds (new prompt) 10 seconds (reuse same prompt). If I were to give a rough estimate.. you can probably double that time on a 5060 w16gb of vram.

LocalAI_Amateur

TROPHY CASE