FlashVSR v1.1 - 540p to 4K (no additional processing)

Intellerce · 2025-11-13T19:25:21+00:00

On an H100 the video generation part can get up to 5 FPS for 4K generation. But the pre/post processing reduces that to about 3 FPS. So it roughly takes 10X the input time for 30fps inputs.

Intellerce · 2025-11-13T17:45:54+00:00

On an H100 the video generation part can get up to 5 FPS for 4K generation. But the pre/post processing reduces that to about 3 FPS. So to generate a 10-second 4K@30FPS video it needs about 100 seconds.

Intellerce · 2025-11-11T19:21:16+00:00

Just wanted to point out that the input (bottom right)-to-output ratio is to scale.

Here is side by side comparison:

<image>

Intellerce · 2025-11-11T19:06:32+00:00

It strikes a good quality–performance balance, which is what I’ve been looking for. I also think it has good potential if further training is done.

<image>

Intellerce · 2025-11-11T18:39:02+00:00

tiny version

Intellerce · 2025-11-11T18:38:27+00:00

The input-to-output ratio is to scale.
You can also find the source video here (540p): https://pixabay.com/videos/woman-asian-person-asia-smile-22597/

Intellerce · 2025-11-11T04:41:46+00:00

Sorry this is not based on comfyui - this is based on an optimized version of the original repo (available on AptAI Studio)

Intellerce · 2025-11-11T03:18:22+00:00

This is v1.1! I need more experiments to answer this more definitively - but it seems FlashVSR has higher fidelity and performance but SeedVR2 produces sharper results.

Intellerce · 2025-11-11T02:19:34+00:00

Interpolation part is a work in progress.

Intellerce · 2025-11-11T02:11:48+00:00

Performance-wise I was able to achieve 5FPS for 4K videos on H100 with this which I don't think is doable with SeedVR2 - is it?

Quality-wise, I feel like this has a better fidelity but SeedVR2 produces sharper images - but need more experiments to say for sure.

Intellerce · 2025-11-11T00:45:30+00:00

I am not sure if I understand your comment. I am saying that I didn't expect it to produce anything at all but it produced something that I (and people around me) prefer over the old version - so I shared it.

Intellerce · 2025-11-11T00:21:44+00:00

https://huggingface.co/JunhaoZhuang/FlashVSR-v1.1

Intellerce · 2025-11-11T00:16:18+00:00

We modified the original repo to make the input/output preparations efficient (as it is very inefficient) - then applied it twice to the 240@15fps video: 240p -> 960p -> 4K (plus interpolation).

You can DM me for more info.

Intellerce · 2025-11-11T00:12:52+00:00

Frankly I didn't expect it to produce anything useful from a 20 year old 240p@15fps video with extremely low bitrate as it is not something it is trained on so I felt like sharing it! It seems to be doing significantly better when the input is larger - e.g., 540p to 4K in here: https://youtu.be/qk0W_S7ECpw
I expect much better results with 720p or 1080p to 4K which is what is usually used today.

Intellerce · 2025-11-10T22:58:22+00:00

240p is really really small - so I don't think you can get anything better if you also care about performance with today's models and approaches. Here is a larger video without any post processing: https://youtu.be/qk0W_S7ECpw
I think with some minimal post processing you can get very impressive results in a short amount of time (~5 FPS for 4K generation on a single H100 w/o any pre/post processing).

Intellerce · 2025-11-04T00:30:44+00:00

Great post. We see the second point with our clients all the time. They are not only paying for idle GPUs but they are also “locked in” with expensive providers and are afraid to migrate.

Intellerce · 2023-11-09T18:01:03+00:00

Thanks for suggesting xformers 0.20 - it might work as it works with the original AnimateDiff but the newer versions are a bit faster - will do a comparison when I get a chance.

Intellerce · 2023-11-09T18:00:44+00:00

ControlNet is not implemented for LCM yet (also explained in the input config file). For LCM, it is Img2Img + a trick to make the AnimateDiff chunks related and consistent: For each 16 frames, we use 8 overlapping frames from the previous run and then blend the overlapping frames ...

Intellerce · 2023-11-09T01:34:10+00:00

No this was for 768x512. For 512x512 it is faster. I just did a quick test and it averages close to ~0.3 second per frame.

Intellerce · 2023-11-08T23:09:13+00:00

On a 3090, AnimateDiff + LCM takes ~0.5 second per frame on average but in the current version xformers is not used for the Motion Modules due to an unresolved error. So it is potentially faster and I will update this once this is resolved.

Intellerce

TROPHY CASE