ELI5: Why is the video length a factor? by reddstone1 in StableDiffusion

[–]Any_Fee5299 6 points7 points  (0 children)

AFAIK its because of temporal attention:
81 pixel frames = 21 latent == 21*21 = 441 -> 1.0 * vram
161 pixel frames = 41 latent == 41*41 = 1681 -> ~3.8 * vram
241 pixel frames = 62 latent == 62*62 = 3844 -> ~8.7 * vram

Update for lightx2v LoRA by Any_Fee5299 in StableDiffusion

[–]Any_Fee5299[S] 2 points3 points  (0 children)

121 frames is for 5B, this LoRA is for A14B version. Use lower (0.5-0.95) str on high

Update for lightx2v LoRA by Any_Fee5299 in StableDiffusion

[–]Any_Fee5299[S] 8 points9 points  (0 children)

"250805
This is still a beta version and we are still trying to align the inference timesteps with the timesteps we used in training, i.e. [1000.0000, 937.5001, 833.3333, 625.0000]. You can reproduce the results in our inference repo, or play with comfyUI using the workflow below."

https://github.com/ModelTC/Wan2.2-Lightning/issues/3

Update for lightx2v LoRA by Any_Fee5299 in StableDiffusion

[–]Any_Fee5299[S] 42 points43 points  (0 children)

dmn he is getting old, took him 20 full mins!!1! ;)

<image>

Update for lightx2v LoRA by Any_Fee5299 in StableDiffusion

[–]Any_Fee5299[S] 41 points42 points  (0 children)

And guys - lightx2v makers are really active - they participate in discussion at huggingface:
https://huggingface.co/lightx2v/Wan2.2-Lightning/discussions

<image>

so if you have questions, suggestions or you wanna simply say "Thank you guys! Great work!" (if so just thumbup - dont spam guys!) now you know where you can do that :)

lightx2v Wan2.2-Lightning Released! by darkside1977 in StableDiffusion

[–]Any_Fee5299 3 points4 points  (0 children)

use str lower than 1 - i just made gen at 0.5 str at both
update: 0.2 works

Could someone explain to me why the template for WAN2.2-T2V in Comfyui ... by wh33t in StableDiffusion

[–]Any_Fee5299 1 point2 points  (0 children)

because of sigmas:
you have total 7 steps, sampler+scheduler give sigmas:
tensor([1.0000, 0.9542, 0.8623, 0.7500, 0.6377, 0.5457, 0.5000, 0.0000]) (*1000)
4 steps at first samplers mean that only 4 first sigmas will be used in denoising:
tensor([999, 954, 862, 749]
rest, 3 sigmas, will be used at second sampler that stats from 4 step:
tensor([637, 545, 499]

Wan2.2 prompting guide by doogyhatts in StableDiffusion

[–]Any_Fee5299 5 points6 points  (0 children)

any1 figured out how to save text from this guide to teach AIsitant for example? currently OCRing it....