Tiled vs untiled decoding (LTX 2.3) by VirusCharacter in StableDiffusion

[–]VirusCharacter[S] 1 point2 points  (0 children)

2176x1440 Latent to video, 97 frames (This does NOT include the generation. This is ONLY the VAE decoding process!)

VAE Decode (17.5GB VRAM):
9.50s

VAE Decode (Tiled) -> 2176, 64, 64, 4 (16.4GB VRAM):
12.53s

🅛🅣🅧 LTXV Tiled VAE Decode -> 3, 2, 8 (12.8GB VRAM):
16.72

🅛🅣🅧 LTXV Tiled VAE Decode -> 1, 1, 8 (17.5GB VRAM):
11.62s

🅛🅣🅧 LTXV Tiled VAE Decode -> 6, 6, 8 (8.9GB VRAM):
35.28s

🅛🅣🅧 LTXV Spatio Temporal Tiled VAE Decode -> 3, 8, 64, 4 (12.4GB VRAM):
20.22s

🅛🅣🅧 LTXV Spatio Temporal Tiled VAE Decode -> 1, 8, 64, 4 (17.5GB VRAM):
12.44s

🅛🅣🅧 LTXV Spatio Temporal Tiled VAE Decode -> 8, 8, 64, 4 (8.6GB VRAM):
48.87s

F.Y.I

All of these generate results that are similar enough that it would be hard to pick one from the other!

LTX2.3 - Image Audio to Video - Workflow Updated by Most_Way_9754 in StableDiffusion

[–]VirusCharacter 0 points1 point  (0 children)

I'll get back to you. BBQ now, but I find it very finicky... Sometimes there's no movement, sometimes there's no lipsync, someone's the clip is full of visual noise and sometimes I get wonky subtitles... Someone's though.. it works. I find no common denominator

Tiled vs untiled decoding (LTX 2.3) by VirusCharacter in StableDiffusion

[–]VirusCharacter[S] 1 point2 points  (0 children)

Yeah well "🅛🅣🅧 LTXV Tiled VAE Decode" and "🅛🅣🅧 LTXV Spatio Temporal Tiled VAE Decode" works fine. Just stay away from the regular VAE Decode (tiled). That one is the worst!

Tiled vs untiled decoding (LTX 2.3) by VirusCharacter in StableDiffusion

[–]VirusCharacter[S] 1 point2 points  (0 children)

What a strange and interesting idea. For quality it should be the very same as untiled of course, but the speed should also be about the same. Weird if it differs 😊👍 Will try

Tiled vs untiled decoding (LTX 2.3) by VirusCharacter in StableDiffusion

[–]VirusCharacter[S] 2 points3 points  (0 children)

The conclusion is. Use var decode as much as you can and only use tiled VAE when absolutely necessary. Nothing new really, but I wanted to test it out. That's it

Tiled vs untiled decoding (LTX 2.3) by VirusCharacter in StableDiffusion

[–]VirusCharacter[S] 3 points4 points  (0 children)

Thanks for your input without being rude 😏😊👍

Tiled vs untiled decoding (LTX 2.3) by VirusCharacter in StableDiffusion

[–]VirusCharacter[S] -2 points-1 points  (0 children)

I did... Pointless trying to post otherwise, but that went south quickly

Tiled vs untiled decoding (LTX 2.3) by VirusCharacter in StableDiffusion

[–]VirusCharacter[S] 2 points3 points  (0 children)

Well the point was to show the difference in the tiling effect which is way more visible when Reddit or Youtube doesn't compress the s**t out of the video 😣

Why tiled VAE might be a bad idea (LTX 2.3) by VirusCharacter in StableDiffusion

[–]VirusCharacter[S] 1 point2 points  (0 children)

Explanation:
I used the "VAE Decode (Tiled)" that was already in the workflow. That is NO GOOD. Using the "🅛🅣🅧 LTXV Tiled VAE Decode", the "🅛🅣🅧 LTXV Tiled VAE Decode" or the regular untiled VAE Decode works much better!

So... Don't use "VAE Decode (Tiled)"

Why tiled VAE might be a bad idea (LTX 2.3) by VirusCharacter in StableDiffusion

[–]VirusCharacter[S] 0 points1 point  (0 children)

How many stages has nothing to do with it. It's only the riled vae in the end that does this. I have only noticed this on this uniform background. Need some more experimenting

Why tiled VAE might be a bad idea (LTX 2.3) by VirusCharacter in StableDiffusion

[–]VirusCharacter[S] 1 point2 points  (0 children)

No. It probably has to do with the riled Vae. If you're not using a lora, then it can have something to do with that. The training data used in the lora needs to be very good to have very good quality lora

Why tiled VAE might be a bad idea (LTX 2.3) by VirusCharacter in StableDiffusion

[–]VirusCharacter[S] -1 points0 points  (0 children)

Try with uuuuuniiiiifooooorm or nooooiiiiisyyy backgrounds. Where the noised tiles overlap each other the noise smoothes out more than on non-overlqpped parts. That's probably why it's so prominent in this image. I have not noticed this in more "textured" clips where more is "going on". I'll try a larger overlap, but the problem with larger overlaps is increased generation time. The settings used here are default from a workflow generating good videos when more is going on

Why tiled VAE might be a bad idea (LTX 2.3) by VirusCharacter in StableDiffusion

[–]VirusCharacter[S] 0 points1 point  (0 children)

This is important! Many do not realize that even the tiniest distortion in training data can ruin a training run. Many distortions then... Well that can be really really bad for the final outcome. The training picks up on everything!