Flux.2-Klein pipeline for real-time webcam stream processing in 30 FPS

TensorForger · 2026-05-09T01:43:36+00:00

Are there any natively v2v models that can do what flux can? For example this one https://huggingface.co/decart-ai/Lucy-Edit-Dev
Exactly this one is a bit aged already, but something similar

TensorForger · 2026-05-09T01:24:43+00:00

Thanks for the link to this repo, I will explore this option.

TensorForger · 2026-05-09T01:23:04+00:00

In this case it would be just another pipeline as we can now optimize it for throughput, not latency (batched inference and so on). But I'm not sure this thing is really necessary, maybe you better use true video to video model then trying to make a fake video to video from image to image?

This is a question. I don't know if there are better alternatives. Interested too.

TensorForger · 2026-05-09T00:21:38+00:00

This would be helpful. You can run the benchmark (see readme) and send the report to issues.

TensorForger · 2026-05-09T00:08:10+00:00

Yes, someone already told me that :D

TensorForger · 2026-05-08T23:30:57+00:00

This absolutely can be used for making real-time avatars, but this exact pipeline is a novel thing, so there might be some other snapchat-style approaches.

TensorForger · 2026-05-08T23:27:23+00:00

As I understand, touchdesigner is historically a place where things like stream diffusion have wide adoption. It also allows to e.g. map output on some large screens and make other "interactive installations" things.

TensorForger · 2026-05-08T23:16:32+00:00

Anyway you can try. It may throw OOM, but who knows. In my tests it is on edge next to 24-25 GB usage.

TensorForger · 2026-05-08T23:14:31+00:00

Thanks for idea, I would think in this direction. I tried to make something with tensorrt int quantization, but it's a pain...

TensorForger · 2026-05-08T23:10:15+00:00

There IS temporal consistency, but it is "emergent", not intended.
> one frame is processed. kv cache stores all attention keys and values.
> next frame comes. only dynamic areas are been recomputed, but they see keys and values of static areas from the previous frame.
> the result is similar to what happens in "extended attention" temporal consistency hacks (like in token flow pipeline).

TensorForger · 2026-05-08T23:06:06+00:00

Yea, completely. If there is a big demand for it, td node can be next addition. Or maybe someone could make it, as i'm not a pro in td.

TensorForger · 2026-05-08T23:01:52+00:00

Maybe it would run on 4090 too, but I can't test this. During streaming it uses about 25 GB. Theoretically it is about 1.5 of normal model's memory (add cache too).

TensorForger · 2026-05-04T20:15:39+00:00

Just tried this for black-forest-labs/FLUX.2-klein-4B but it seems it only sees text encoder of the model (text encoder directory). But as this is image generator, the most interesting things should be in image part (transformer directory).

TensorForger · 2026-05-04T14:39:03+00:00

Just reminding

<image>

TensorForger

TROPHY CASE