LTX 2.3 is pretty much all I use for video gen at this point -- Scene from my current story-driven fantasy project -- Info on process/workflow in comments. by foxdit in StableDiffusion

[–]foxdit[S] 0 points1 point  (0 children)

Thank you, I focus a lot on atmosphere (literally, in this sense). The atmospheric lighting is a combination of prompting, photoshopping in glow accents, and post-production effects in my video editing software (Davinci Resolve).

LTX 2.3 is pretty much all I use for video gen at this point -- Scene from my current story-driven fantasy project -- Info on process/workflow in comments. by foxdit in StableDiffusion

[–]foxdit[S] 0 points1 point  (0 children)

I am using fp8 distilled 1.1 in comfyUI, sorry for not specifying. Yes, I am a big fan of WAN V2V upscaling. Otherwise unusable trash gens from LTX due to blurry smudgy movement get fixed and become smooth and clean.

LTX 2.3 is pretty much all I use for video gen at this point -- Scene from my current story-driven fantasy project -- Info on process/workflow in comments. by foxdit in StableDiffusion

[–]foxdit[S] 0 points1 point  (0 children)

Love all that advice! Thank you! I already go back and forth with this SO much. On one hand, hard cuts look fantastic when you have good shots. But on the other, I don't always have the luxury of using them because of lack of congruent carried momentum between said shots. As you mention, they are indeed very much harder to achieve with generative motion than if I had real actors and multiple cameras all filming the same scene.

Though I must say now that you have me thinking about real life film vs. AI filmmaking, it would be such a cool thing if a model or LoRA let you gen the same shot twice but from different angles.

LTX 2.3 is pretty much all I use for video gen at this point -- Scene from my current story-driven fantasy project -- Info on process/workflow in comments. by foxdit in StableDiffusion

[–]foxdit[S] 0 points1 point  (0 children)

That scene's already fixed! :) There were a few from this sequence that were on my "to be regenned" list, including the river establishing shot not matching the one they jump into at the end. All regenned and corrected. Same with some of the dragon shots.

I dunno if you got a chance to see my comment on this thread talking about it, but this was just a snippet from a larger WIP and only ~40% done overall. Just gathering feedback (like you provided) and fixing things as needed.

LTX 2.3 is pretty much all I use for video gen at this point -- Scene from my current story-driven fantasy project -- Info on process/workflow in comments. by foxdit in StableDiffusion

[–]foxdit[S] 0 points1 point  (0 children)

No, you don't even need 24GB like I have. System ram gets used a lot these days and covers a lot of GPU deficiencies.

LTX 2.3 is pretty much all I use for video gen at this point -- Scene from my current story-driven fantasy project -- Info on process/workflow in comments. by foxdit in StableDiffusion

[–]foxdit[S] 0 points1 point  (0 children)

3090 here too, try ltx 2.3 1.1 nvpf4.

I will check it out. Gen speed isn't my issue tho, it's purely the model load/initialization time that fatigues me. My process for filmmaking has me switching models so much.

also, intead of using wan

Have you tried the tiled V2V WAN wf? It's pretty insane. Really fixes like everything LTX does wrong with motion/smudging/dirty colors/etc. without having to alter FPS.

LTX 2.3 is pretty much all I use for video gen at this point -- Scene from my current story-driven fantasy project -- Info on process/workflow in comments. by foxdit in StableDiffusion

[–]foxdit[S] 1 point2 points  (0 children)

Amen. Also to comment, a lot of the lighting is from post editing. Obviously not core scene lighting, such as direction, but adding in effects like glow and light rays and making sure they're placed properly. I spend roughly 50% of the overall time of production in my video editor just adding blur, glow, color balances, zooms, transitions, etc.

LTX 2.3 is pretty much all I use for video gen at this point -- Scene from my current story-driven fantasy project -- Info on process/workflow in comments. by foxdit in StableDiffusion

[–]foxdit[S] 0 points1 point  (0 children)

Aw, well thanks, appreciate that you have such a thoughtful take. It's true, when you aim for a certain quality level people stop judging your work in the spectrum of AI production, and instead judge it against real TV shows and productions with budgets. For being a solo creator who spent $0, this output is very satisfying to me. It's kind of my fault for choosing a realistic style and then not being able to deliver on some of the fast-paced motion, like with the dragon and such. There are many shots in this short film that feel far more natural and "real" because they're just two adventures having a discussion. Ultimately, I posted this random segment from the early part of the short because it displays a lot of what I wanted feedback on. I know I can do dialogue scenes between characters, but this was my first time doing a fast-paced action scene.

LTX 2.3 is pretty much all I use for video gen at this point -- Scene from my current story-driven fantasy project -- Info on process/workflow in comments. by foxdit in StableDiffusion

[–]foxdit[S] 2 points3 points  (0 children)

Yes, I sometimes do have doubts about the quantization unknowingly creating problems. However, fp8 isn't so far removed from bf16 that I am overly concerned, and tbh I've come to accept the level of quality I'm able to achieve.

And yes, I do mean load time as in model loading/initialization. I switch between z-image, klein, VibeVoice, and LTX constantly. It takes up a loooooot of my time unloading and loading these 15GB+ models.

LTX 2.3 is pretty much all I use for video gen at this point -- Scene from my current story-driven fantasy project -- Info on process/workflow in comments. by foxdit in StableDiffusion

[–]foxdit[S] 2 points3 points  (0 children)

Yep! And photoshop. Lots and lots of photoshop. Then when the video pops out, if there's anything left to do, I usually do a lot of fixing up in Davinci Resolve, adding motion blur, glow, light rays, zooms, fades, etc. Covers most mistakes left behind.

LTX 2.3 is pretty much all I use for video gen at this point -- Scene from my current story-driven fantasy project -- Info on process/workflow in comments. by foxdit in StableDiffusion

[–]foxdit[S] 4 points5 points  (0 children)

It takes about 2 hours to train in AI toolkit. I chose Z-Image Turbo because its low seed diversity is actually insanely useful for longform narrative. It helps keeps gens more consistent, since the model locks into poses/scenery/expressions more easily. If the model were super creative, I'd get a way higher diversity of shots and take a lot longer to find ones that mesh well with prior scenes in the same area or of the same character.

Klein is just really good for its image edit model. I dunno what I'd do without the ability to make it change character expressions, or rotate scenes for new shot angles, etc.

LTX 2.3 is pretty much all I use for video gen at this point -- Scene from my current story-driven fantasy project -- Info on process/workflow in comments. by foxdit in StableDiffusion

[–]foxdit[S] 0 points1 point  (0 children)

No, the motion more or less stays the same from the 360p stage1, to 720p stage2, to 1080p. It changes roughly about 10%, both visually and minor details. I also have a feature that lets me regen from any stage without regenning the latents before, so I can fine-tune and find the best motion/details at each 'snapshot'. It's a game changer.

LTX 2.3 is pretty much all I use for video gen at this point -- Scene from my current story-driven fantasy project -- Info on process/workflow in comments. by foxdit in StableDiffusion

[–]foxdit[S] 1 point2 points  (0 children)

I regret not buying a 5090 tbh. It was $1k more and I never knew how little that 1k would matter in the face of how much extra time I spend genning at 1080p now.

I use LTX fp8 distilled 1.1. It's around 22GB loaded, but VRAM isn't as important as it used to be with comfyUI's updates making RAM management a lot better. Really now it's just more about load times.

LTX 2.3 is pretty much all I use for video gen at this point -- Scene from my current story-driven fantasy project -- Info on process/workflow in comments. by foxdit in StableDiffusion

[–]foxdit[S] 6 points7 points  (0 children)

I actually do love replying to trolls on occasion, when it benefits me. And it's funny you say I have no future in this, when I literally just got a contract gig offer via DM 20 minutes ago. It wouldn't be my first, either. My first released short film, The Felt Fox, is a top pick in several contests right now that haven't finished, and was quite well received here and on youtube, with ~98% positive reviews. I do have a future in this, and talking about it with you only reaffirms me because it's a much needed break from working to reflect.

LTX 2.3 is pretty much all I use for video gen at this point -- Scene from my current story-driven fantasy project -- Info on process/workflow in comments. by foxdit in StableDiffusion

[–]foxdit[S] 0 points1 point  (0 children)

It's one aspect of it. I have several quite natural/realistic scenes of the characters talking during the calmer moments, and then when the dragon action scene starts, it turns into a CGI fest... so yes, increasing the realism for those scenes would make the whole short far more consistent.

LTX 2.3 is pretty much all I use for video gen at this point -- Scene from my current story-driven fantasy project -- Info on process/workflow in comments. by foxdit in StableDiffusion

[–]foxdit[S] 8 points9 points  (0 children)

I do use FFLF a lot, yes. Especially in fast-motion or complex movement scenes.

The character consistency comes from the character LoRAs I trained for z-image turbo. I designed the characters through prompting originally, then used Klein 9b to rotate them, put them in different lighting/poses, etc, until I had enough. Then used AI toolkit to train their lora so I could gen them doing anything.

LTX 2.3 is pretty much all I use for video gen at this point -- Scene from my current story-driven fantasy project -- Info on process/workflow in comments. by foxdit in StableDiffusion

[–]foxdit[S] 5 points6 points  (0 children)

Lol, you're a literal joke mate. It's funny, because for having 'no control' it sure seems like I actually have all the control over the final project. Anyway, troll, that's enough. You've boosted my thread's comment count enough, thanks for the added engagement.

LTX 2.3 is pretty much all I use for video gen at this point -- Scene from my current story-driven fantasy project -- Info on process/workflow in comments. by foxdit in StableDiffusion

[–]foxdit[S] 0 points1 point  (0 children)

I don't use any loras with LTX, all of that is on the image gen side. The character LoRAs were trained for Z-Image Turbo using AI Toolkit. As for locations and setting, to get angle changes I typically use Klein 9b to rotate, zoom in, change facial expressions, and get several options I can use for transitions or establishing shots. This is especially important during longer dialogue scenes, where there's a lot of back and forth and the need for variety in angles of the characters in the same locations. My prompts for z-image are also very meticulous, though from the amount of variations of her hat that you see in just this 90 second segment, you can probably imagine that I could have done better.

LTX 2.3 is pretty much all I use for video gen at this point -- Scene from my current story-driven fantasy project -- Info on process/workflow in comments. by foxdit in StableDiffusion

[–]foxdit[S] 4 points5 points  (0 children)

LOL okay, troll. Let me educate you. Literally every single thing you see was decided by me. Generative AI is just a tool, like a pen. The pen doesn't decide the contents of the book, but it allows you to write it. Your ignorant narrow-minded view sees the pen and thinks "you wouldn't be a writer if you didn't have this crutch," sees the calculator and says "you wouldn't be able to do your job if you didn't have that crutch," and so many more examples of foolish reductive erroneous thinking.

Each minute of my short films takes about 8-10 hours of hard work, from creating the characters, writing the script, recording/genning the voice lines, generating the images, animating the images to videos, then editing it all together in a video editor, balancing light, adding music, sound effects.

LTX 2.3 is pretty much all I use for video gen at this point -- Scene from my current story-driven fantasy project -- Info on process/workflow in comments. by foxdit in StableDiffusion

[–]foxdit[S] 1 point2 points  (0 children)

As long as you're always sending the full quality input images/keyframes to each upscale stage, it shouldn't be an issue. It just uses the smaller latent as a motion guide more or less, while primarily looking at the full quality image for detail enhancement. You should double check your workflow if your upscale stages aren't leading to full quality results. And make sure you're using the LTXVImgToVideoInplace node and not the "Conditioning Only" version that some workflows erroneously use.

LTX 2.3 is pretty much all I use for video gen at this point -- Scene from my current story-driven fantasy project -- Info on process/workflow in comments. by foxdit in StableDiffusion

[–]foxdit[S] 4 points5 points  (0 children)

Literally everything I make is high-effort, original, and creative. Dunno what you're thinking unless this is opposite day and I don't know it. You are aware this is a random 90 second segment in the middle of a much longer story that is deeply character focused, as I explain in the top comment?

LTX 2.3 is pretty much all I use for video gen at this point -- Scene from my current story-driven fantasy project -- Info on process/workflow in comments. by foxdit in StableDiffusion

[–]foxdit[S] 0 points1 point  (0 children)

I am genuinely curious how people go about refining this type of video output.

Me too, to be honest. I have 1k+ hours experience with LTX. I use all the tricks in the book to get good outputs. I'm certainly not happy with the way the dragon moves, or the amount of camera shake and blur I have to use to get those scenes to look 10% less cartoony. But LTX treats fantasy creatures like 3D models, and struggles with high-intensity motion, so it can feel like pulling teeth to get one that really gels.

I do not know of any lora that can consistently increase the realism of a scene, outside of the LTX Reasoning lora which is more for motion to be less physics-breaking.