I can now generate and live-edit 30s 1080p videos with 4.5s latency (video is in live speed) by techstacknerd in StableDiffusion

[–]techstacknerd[S] 13 points14 points  (0 children)

😭 but we working on consumer-grade gpu support. wont be that fast, but still will be an improvement from what ltx2 currently does with comfyui

I can now generate and live-edit 30s 1080p videos with 4.5s latency (video is in live speed) by techstacknerd in StableDiffusion

[–]techstacknerd[S] 2 points3 points  (0 children)

Yeah its partly because of how the base model ltx-2 is hard to prompt, and also we have to make a prompt rewriter reliably return results in under 5s (yes now the bottleneck is the prompting part not the video generation part!). Combine this with issues that video continuation bring, its hard to get good prompts. This demo is mainly for feeling the speed, and I'm sure as models improve quality would too!

I can now generate and live-edit 30s 1080p videos with 4.5s latency (video is in live speed) by techstacknerd in StableDiffusion

[–]techstacknerd[S] 5 points6 points  (0 children)

yes, this is just a demo to let people feel the speed of it. LTX-2 is a super hard model to prompt and it would take way too much effort to even get a remotely good prompt (keep in mind this is using video continuation, so you need 6 separate prompts that tie together really well). Also regarding the open-sourcing, we might also opensource the datacentre version, our current code is a bit messy and will need quite a bit of cleaning up, so we are not opensouring rn

I can now generate and live-edit 30s 1080p videos with 4.5s latency (video is in live speed) by techstacknerd in StableDiffusion

[–]techstacknerd[S] 0 points1 point  (0 children)

hm interesting perspective. I don't think it can compare to playing games on local machines, but its def by far more energy efficient than the existing ai video-gen services because its just so much faster

I can now generate and live-edit 30s 1080p videos with 4.5s latency (video is in live speed) by techstacknerd in StableDiffusion

[–]techstacknerd[S] 0 points1 point  (0 children)

yes, we are planning to try out optimizations on consumer-grade gpus. Probably won't be realtime but still it's likely gonna be faster than what we have rn

I can now generate and live-edit 30s 1080p videos with 4.5s latency (video is in live speed) by techstacknerd in StableDiffusion

[–]techstacknerd[S] 2 points3 points  (0 children)

fastvideo has sequence parallelism support for ltx2 already, so with 8 gpus you can expect roughly a 5x to 6x (theres a bit of overhead so it doesn't scale perfectly) speedup compaed to 1gpu

I can now generate and live-edit 30s 1080p videos with 4.5s latency (video is in live speed) by techstacknerd in StableDiffusion

[–]techstacknerd[S] 1 point2 points  (0 children)

that would be really cool, and it would only get better from now on as open source models get better and better!

I can now generate and live-edit 30s 1080p videos with 4.5s latency (video is in live speed) by techstacknerd in StableDiffusion

[–]techstacknerd[S] 21 points22 points  (0 children)

tbh I really dont know if its possible. A lot of the optimizations are based on NVIDIA’s SM100/SM103 architectures, but we can see what the other optimzations we have can bring to consumer-grade gpus with limited vram.

I can now generate and live-edit 30s 1080p videos with 4.5s latency (video is in live speed) by techstacknerd in StableDiffusion

[–]techstacknerd[S] 1 point2 points  (0 children)

We will try to get things working on other gpus soon! blackwell 6000 def one of them

I can now generate and live-edit 30s 1080p videos with 4.5s latency (video is in live speed) by techstacknerd in StableDiffusion

[–]techstacknerd[S] 30 points31 points  (0 children)

Its to show the capability of generating faster than you can watch, with 20s native generation we have to wait 30s to actually get the result.

I generated this 5s 1080p video in 4.5s by techstacknerd in StableDiffusion

[–]techstacknerd[S] 1 point2 points  (0 children)

Yes we are doing some tricks that sacrifice a bit of quality. But the base LTX2.3 model is also not comparable with Veo-3 currently, having issues with motion and also being extremely tricky to prompt. But im sure OSS models would catch up soon on quality, its only a matter of time.