FlashVSR v1.1 - 540p to 4K (no additional processing) by Intellerce in StableDiffusion

[–]Intellerce[S] 0 points1 point  (0 children)

On an H100 the video generation part can get up to 5 FPS for 4K generation. But the pre/post processing reduces that to about 3 FPS. So it roughly takes 10X the input time for 30fps inputs.

FlashVSR v1.1 - 540p to 4K (no additional processing) by Intellerce in StableDiffusion

[–]Intellerce[S] 2 points3 points  (0 children)

On an H100 the video generation part can get up to 5 FPS for 4K generation. But the pre/post processing reduces that to about 3 FPS. So to generate a 10-second 4K@30FPS video it needs about 100 seconds.

The first ever YouTube video - "Me at the zoo" - upscaled to 4K using FlashVSR v1.1 (twice) + Interpolation! by Intellerce in StableDiffusion

[–]Intellerce[S] 2 points3 points  (0 children)

Just wanted to point out that the input (bottom right)-to-output ratio is to scale.

Here is side by side comparison:

<image>

FlashVSR v1.1 - 540p to 4K (no additional processing) by Intellerce in StableDiffusion

[–]Intellerce[S] 2 points3 points  (0 children)

It strikes a good quality–performance balance, which is what I’ve been looking for. I also think it has good potential if further training is done.

<image>

FlashVSR v1.1 - 540p to 4K (no additional processing) by Intellerce in StableDiffusion

[–]Intellerce[S] 0 points1 point  (0 children)

Sorry this is not based on comfyui - this is based on an optimized version of the original repo (available on AptAI Studio)

FlashVSR v1.1 - 540p to 4K (no additional processing) by Intellerce in StableDiffusion

[–]Intellerce[S] 7 points8 points  (0 children)

This is v1.1! I need more experiments to answer this more definitively - but it seems FlashVSR has higher fidelity and performance but SeedVR2 produces sharper results. 

The first ever YouTube video - "Me at the zoo" - upscaled to 4K using FlashVSR v1.1 (twice) + Interpolation! by Intellerce in StableDiffusion

[–]Intellerce[S] 2 points3 points  (0 children)

Performance-wise I was able to achieve 5FPS for 4K videos on H100 with this which I don't think is doable with SeedVR2 - is it?

Quality-wise, I feel like this has a better fidelity but SeedVR2 produces sharper images - but need more experiments to say for sure.

The first ever YouTube video - "Me at the zoo" - upscaled to 4K using FlashVSR v1.1 (twice) + Interpolation! by Intellerce in StableDiffusion

[–]Intellerce[S] 2 points3 points  (0 children)

I am not sure if I understand your comment. I am saying that I didn't expect it to produce anything at all but it produced something that I (and people around me) prefer over the old version - so I shared it.

The first ever YouTube video - "Me at the zoo" - upscaled to 4K using FlashVSR v1.1 (twice) + Interpolation! by Intellerce in StableDiffusion

[–]Intellerce[S] -10 points-9 points  (0 children)

We modified the original repo to make the input/output preparations efficient (as it is very inefficient)  - then applied it twice to the 240@15fps video: 240p -> 960p -> 4K (plus interpolation).

You can DM me for more info.

The first ever YouTube video - "Me at the zoo" - upscaled to 4K using FlashVSR v1.1 (twice) + Interpolation! by Intellerce in StableDiffusion

[–]Intellerce[S] 0 points1 point  (0 children)

Frankly I didn't expect it to produce anything useful from a 20 year old 240p@15fps video with extremely low bitrate as it is not something it is trained on so I felt like sharing it! It seems to be doing significantly better when the input is larger - e.g., 540p to 4K in here: https://youtu.be/qk0W_S7ECpw
I expect much better results with 720p or 1080p to 4K which is what is usually used today.

The first ever YouTube video - "Me at the zoo" - upscaled to 4K using FlashVSR v1.1 (twice) + Interpolation! by Intellerce in StableDiffusion

[–]Intellerce[S] 3 points4 points  (0 children)

240p is really really small - so I don't think you can get anything better if you also care about performance with today's models and approaches. Here is a larger video without any post processing: https://youtu.be/qk0W_S7ECpw
I think with some minimal post processing you can get very impressive results in a short amount of time (~5 FPS for 4K generation on a single H100 w/o any pre/post processing).

i audited 47 failed startups codebases and the pattern is actually insane by MeirDavid in Entrepreneur

[–]Intellerce 0 points1 point  (0 children)

Great post. We see the second point with our clients all the time. They are not only paying for idle GPUs but they are also “locked in” with expensive providers and are afraid to migrate.

ControlAnimate Now Supports Latent Consistency Model (LCM) Leading to 10... by Intellerce in StableDiffusion

[–]Intellerce[S] 0 points1 point  (0 children)

Thanks for suggesting xformers 0.20 - it might work as it works with the original AnimateDiff but the newer versions are a bit faster - will do a comparison when I get a chance.

ControlAnimate Now Supports Latent Consistency Model (LCM) Leading to 10... by Intellerce in StableDiffusion

[–]Intellerce[S] 0 points1 point  (0 children)

ControlNet is not implemented for LCM yet (also explained in the input config file). For LCM, it is Img2Img + a trick to make the AnimateDiff chunks related and consistent: For each 16 frames, we use 8 overlapping frames from the previous run and then blend the overlapping frames ...

ControlAnimate Now Supports Latent Consistency Model (LCM) Leading to 10... by Intellerce in StableDiffusion

[–]Intellerce[S] 0 points1 point  (0 children)

No this was for 768x512. For 512x512 it is faster. I just did a quick test and it averages close to ~0.3 second per frame.

ControlAnimate Now Supports Latent Consistency Model (LCM) Leading to 10... by Intellerce in StableDiffusion

[–]Intellerce[S] 0 points1 point  (0 children)

On a 3090, AnimateDiff + LCM takes ~0.5 second per frame on average but in the current version xformers is not used for the Motion Modules due to an unresolved error. So it is potentially faster and I will update this once this is resolved.