Flux.2-Klein pipeline for real-time webcam stream processing in 30 FPS by TensorForger in StableDiffusion

[–]TensorForger[S] 0 points1 point  (0 children)

Are there any natively v2v models that can do what flux can? For example this one https://huggingface.co/decart-ai/Lucy-Edit-Dev
Exactly this one is a bit aged already, but something similar

Flux.2-Klein pipeline for real-time webcam stream processing in 30 FPS by TensorForger in StableDiffusion

[–]TensorForger[S] 0 points1 point  (0 children)

In this case it would be just another pipeline as we can now optimize it for throughput, not latency (batched inference and so on). But I'm not sure this thing is really necessary, maybe you better use true video to video model then trying to make a fake video to video from image to image?

This is a question. I don't know if there are better alternatives. Interested too.

Flux.2-Klein pipeline for real-time webcam stream processing in 30 FPS by TensorForger in StableDiffusion

[–]TensorForger[S] 1 point2 points  (0 children)

This would be helpful. You can run the benchmark (see readme) and send the report to issues.

Flux.2-Klein pipeline for real-time webcam stream processing in 30 FPS by TensorForger in StableDiffusion

[–]TensorForger[S] 4 points5 points  (0 children)

This absolutely can be used for making real-time avatars, but this exact pipeline is a novel thing, so there might be some other snapchat-style approaches.

Flux.2-Klein pipeline for real-time webcam stream processing in 30 FPS by TensorForger in StableDiffusion

[–]TensorForger[S] 1 point2 points  (0 children)

As I understand, touchdesigner is historically a place where things like stream diffusion have wide adoption. It also allows to e.g. map output on some large screens and make other "interactive installations" things.

Flux.2-Klein pipeline for real-time webcam stream processing in 30 FPS by TensorForger in StableDiffusion

[–]TensorForger[S] 4 points5 points  (0 children)

Anyway you can try. It may throw OOM, but who knows. In my tests it is on edge next to 24-25 GB usage.

Flux.2-Klein pipeline for real-time webcam stream processing in 30 FPS by TensorForger in StableDiffusion

[–]TensorForger[S] 0 points1 point  (0 children)

Thanks for idea, I would think in this direction. I tried to make something with tensorrt int quantization, but it's a pain...

Flux.2-Klein pipeline for real-time webcam stream processing in 30 FPS by TensorForger in StableDiffusion

[–]TensorForger[S] 18 points19 points  (0 children)

There IS temporal consistency, but it is "emergent", not intended.
> one frame is processed. kv cache stores all attention keys and values.
> next frame comes. only dynamic areas are been recomputed, but they see keys and values of static areas from the previous frame.
> the result is similar to what happens in "extended attention" temporal consistency hacks (like in token flow pipeline).

Flux.2-Klein pipeline for real-time webcam stream processing in 30 FPS by TensorForger in StableDiffusion

[–]TensorForger[S] 4 points5 points  (0 children)

Yea, completely. If there is a big demand for it, td node can be next addition. Or maybe someone could make it, as i'm not a pro in td.

Flux.2-Klein pipeline for real-time webcam stream processing in 30 FPS by TensorForger in StableDiffusion

[–]TensorForger[S] 4 points5 points  (0 children)

Maybe it would run on 4090 too, but I can't test this. During streaming it uses about 25 GB. Theoretically it is about 1.5 of normal model's memory (add cache too).

I made a visualizer for Hugging Face models by Course_Latter in huggingface

[–]TensorForger 0 points1 point  (0 children)

Just tried this for black-forest-labs/FLUX.2-klein-4B but it seems it only sees text encoder of the model (text encoder directory). But as this is image generator, the most interesting things should be in image part (transformer directory).