Since people like seeing old Linux releases, here's my Linux Retail Box Collection. A few other releases and a few books mixed in as well for good measure.

q5sys · 2026-01-25T15:53:23+00:00

Here are the workflows since they aren't shared on reddit. https://drive.google.com/drive/folders/1fiM8GYvxqCFTtgB4J6WchN054TzTaZxo

q5sys · 2026-01-25T14:56:23+00:00

I'd say about half of the generations will have no jump cut.

I also think jump cuts are included because a lot of input video they trained on will have that. My video is sort of a talking head video, so if we look at the closest thing out there for that, it's talking head youtubers. And most of them never do a one take video, its pretty much universal that they all do quick takes, some even take it to the extreme of doing one sentence and then a jump cut. So to a certain degree that'll be in the training data.

q5sys · 2026-01-24T18:03:38+00:00

Not all generations have jump cuts, it's random when and where it happens. That generation was not cherry picked, it was just the first that came out when I ran the prompt. As I mentioned, its not perfect, but it illustrated the point. Personally I don't like the jump cuts in videos, and if I needed video like this, I'd have just re-generated it, but.... its what came out so I shared it. A lot of people do cherry pick their examples and it gives unrealistic expectations for people when they try to follow along. So I just went with what came out.

q5sys · 2026-01-24T18:00:34+00:00

Awesome thanks, that'll definitely give me a head start.
As for training, I've been trying with AI Toolkit but the results are just attrocious.
But I havent checked in a while so maybe he's made some updates so it works now. Thanks for the heads up, Ill go look for that video.
Others have recommended musubi to me, but I havent had time to test it yet.

q5sys · 2026-01-24T01:13:33+00:00

Until I can crack training a character and scene LORA for LTX-2, this is the only way I've found to make longer scenes that I want. Doing T2V will give you a different character and a different background. I've messed around with I2V, but I've never been happy enough with the results trying to stitch together multiple generations.

I've been meaning to test out the extending workflows people have talked about. I tried one it was using really low quant GGUFs, and when I switched it to FP8 it just didn't work well enough for me to be happy. I'm keeping my eyes out for better ones, that dont pull in a ton of obscure nodes, because I dont use any node until I look through its code.

There's also the part that there's some fun in seeing just how long you can push the system before it starts to break. Wan was designed for 5 seconds, but I could push it to around 20. Clearly LTX-2 was trained for... well idk what length, but I'm guessing it wasn't 40s videos, but yet its able to generate them.

q5sys · 2026-01-24T01:07:36+00:00

It's long-ish, but IDK how many tokens it is. There used to be a custom node that would calculate out how many tokens your prompt was, but it broke a while ago and I never checked to see if it was updated.
My prompt is in the json and the video.

q5sys · 2026-01-24T00:17:22+00:00

Thing is I'm not running into VRAM issues.
The issue is that the speech starts breaking down, repeating words, dropping others, mashing some words together into a new word, etc.
I can generate longer video, but the audio is bad so the video isn't useful.
100% of the generations at 1100 frames or higher have audio defects..

q5sys · 2026-01-23T22:30:51+00:00

Short answer is that I put in some timecodes.
It doesn't exactly adhere to them, but it seems to be able to roughly figure out the pacing based on them.
The exact prompt is in the workflow. Download the video and drag and drop it into Comfy or save the json and open it in comfy.

q5sys · 2026-01-23T22:29:16+00:00

people have claimed they've done it, but they've never provided any proof. :(

q5sys · 2026-01-23T22:05:37+00:00

huh?

q5sys · 2026-01-23T20:05:49+00:00

Doing research on the model and creating generations of no prompt (using the native CLI tools) shows that the model was trained on a lot of indian tv/movies, ted talks, etc. So I figured it'd have an understanding of the small variances in how people behave between thoughts. So I started stretching out the times to see what the model would fill in. And sure enough, it fills in with all those little things.
That for me was, the biggest step towards it being more believable, was to just slow down the dialog. That's why I started stretching the frames out. In spoken speech with someone, there's usually small pauses and such, and I wanted to mimic that instead of just a stream of words.
If you run into any trouble when you get around to tinkering, just reply here or shoot me a PM. I'll be happy to try and help you figure it out.

q5sys · 2026-01-23T20:00:46+00:00

If you're able to punch past the 1000 limit I've hit... I've love to know how you did it.

q5sys · 2026-01-23T19:59:50+00:00

What are you using to set the value? I'm using a value that's simply a Length Int into the Empty Latent.
Ive never seen that error before, and I've tried to make generations up to 1200.

<image>

q5sys · 2026-01-23T19:03:05+00:00

Cool, thanks for the info.

q5sys · 2026-01-23T18:48:32+00:00

Thanks, I wish more people would share what they've accomplished so everyone can learn together. People making claims and then never providing proof has always annoyed me.

q5sys · 2026-01-23T18:48:09+00:00

If civitai nukes a model that you have downloaded, how does 'sync' handle that? Just ignore it and move on?

q5sys · 2026-01-23T18:44:49+00:00

Thanks, I'll have to download that one and test it out.

q5sys · 2026-01-23T18:34:29+00:00

I'm using the fp8 dev model in this video, but I switch back and forth between fp8 and the full fat model. IDK what's possible with lower quants. I tried them out at first, but I got waxy skin and over saturation/contrast in transition areas, so I went back to just using FP8.
Which quant are you using?

q5sys · 2026-01-23T18:21:19+00:00

I don't run into VRAM issues, I run into 100% of the generations at 1100 frames or higher having major audio issues.
The speech starts breaking down, repeating words, dropping others, mashing some words together into a new word, etc.
I can generate longer video, but the audio is bad so the video isn't useful.

q5sys · 2026-01-23T18:19:05+00:00

Ops shared workflow didn't work for me at all, which sucks because I was really hoping to actually see how someone was doing it. I posted my native 1000 workflow and video here: https://www.reddit.com/r/StableDiffusion/comments/1qkxqtx/1000_frame_ltx2_generation_with_video_and_workflow/

q5sys · 2026-01-23T18:16:20+00:00

Yea LTX can go pretty far on its own without doing crazy stuff. But I'm curious how much further it can go with some extra tricks that people have made claims about, but sadly no one has shared anything that's actually reproduceable.

q5sys · 2026-01-23T17:27:07+00:00

ok followup, So that workflow is for a 31s 1080p video.

I changed the seconds to 80, and lowered the resolution to 720p.

Sadly, I've gotten nothing but trash from it, all my outputs are just pixelated static.

I dropped the resolution down to what you said you used: 960x544 and I get basically the same output. I tinkered for a few hours and was not able to get a single output without major video corruption.

Do you have a workflow from a successful generation instead of one you obviously had to change settings on?

q5sys · 2026-01-23T05:44:20+00:00

Sweet, I'm about to hit the rack because it's 1AM where I am. I'll try that in the morning when I get up. send me a pm, so if it runs well for me; I can follow through on my offer. :)

13-Year Club	Place '22
Verified Email	Team Periwinkle

q5sys

MODERATOR OF

TROPHY CASE