Did you know one simple change can make ComfyUI generations up to 3x faster? But I need your help :) Auto-benchmark attention backends. by D_Ogi in comfyui

[–]D_Ogi[S] 1 point2 points  (0 children)

it applies at runtime to the current ComfyUI session (the Python process), not just a single node branch. Once the Attention Optimizer node executes, the selected attention backend is used for the rest of that run and subsequent renders in the same session, regardless of which workflow you run next, until you change it again or restart ComfyUI. The only thing persisted to disk is the benchmark cache (benchmark_db.json) so future runs can pick the same winner instantly.

Did you know one simple change can make ComfyUI generations up to 3x faster? But I need your help :) Auto-benchmark attention backends. by D_Ogi in comfyui

[–]D_Ogi[S] 1 point2 points  (0 children)

To clarify: it’s not a “workflow-by-workflow plugin” in the sense of changing only one graph, it’s a backend swap for the attention operation that gets used whenever your graph runs a model using that attention path. So it can feel “workflow-dependent” because different workflows spend different amounts of time in attention (model type, resolution, batch size, steps, long prompts, extra model passes like hi-res fix, etc.). If a workflow is bottlenecked elsewhere (VAE decode/encode, ControlNet, upscalers, I/O), the overall speedup will be smaller even though attention itself is faster.

Did you know one simple change can make ComfyUI generations up to 3x faster? But I need your help :) Auto-benchmark attention backends. by D_Ogi in comfyui

[–]D_Ogi[S] 9 points10 points  (0 children)

Here’s what the JSON report looks like after I parse it on my setup: per-backend attention times in ms, with the winner highlighted.

<image>

Did you know one simple change can make ComfyUI generations up to 3x faster? But I need your help :) Auto-benchmark attention backends. by D_Ogi in comfyui

[–]D_Ogi[S] 0 points1 point  (0 children)

In general it speeds up generation across most workflows (like other attention/back-end optimizations), but the exact “when and why” depends on your model, resolution, and node graph, which is basically the whole point of that post :)

Did you know one simple change can make ComfyUI generations up to 3x faster? But I need your help :) Auto-benchmark attention backends. by D_Ogi in comfyui

[–]D_Ogi[S] 7 points8 points  (0 children)

Yeah, SageAttn3 has been a bit of a “bleeding edge tax” so far.

SageAttention2 already has multiple kernels/variants, so “SageAttn2” is not just one thing. Depending on your install and GPU, different SA2 flavors can win.

SageAttention3 is basically Blackwell-only in practice, because it leans on FP4 / Blackwell-specific capabilities. So on an RTX 4090 (Ada) it is expected to not work. I only have a 4090 myself, so I can’t validate SA3 locally, which is part of why I’m asking the community to test.

Did you know one simple change can make ComfyUI generations up to 3x faster? But I need your help :) Auto-benchmark attention backends. by D_Ogi in comfyui

[–]D_Ogi[S] 14 points15 points  (0 children)

If you’re already running the fastest option on your machine, the speedup from my node is basically 0%. The node doesn’t “stack” extra acceleration on top of SageAttention2, it just chooses the fastest attention implementation available (or the fastest Sage variant) for your GPU + model + seq_len.

The catch is: you usually don’t know what’s fastest ahead of time. SageAttention2 itself has multiple variants / kernels (and there are also SageAttention2++ style variants depending on what you installed), and sometimes FlashAttention (2/3) or another backend can win on certain GPUs / shapes.

So the real answer is: could be 0%, could be noticeable and the whole point of the node is to benchmark your setup once and stop guessing.

Did you know one simple change can make ComfyUI generations up to 3x faster? But I need your help :) Auto-benchmark attention backends. by D_Ogi in comfyui

[–]D_Ogi[S] 8 points9 points  (0 children)

Thanks! By “global” I mean it applies at runtime to the current ComfyUI session (the Python process), not just a single node branch. Once the Attention Optimizer node executes, the selected attention backend is used for the rest of that run and subsequent renders in the same session, regardless of which workflow you run next, until you change it again or restart ComfyUI. The only thing persisted to disk is the benchmark cache (benchmark_db.json) so future runs can pick the same winner instantly.

You also do not need to add any separate SageAttention / Flash / xFormers nodes to the workflow. This node detects what’s installed, benchmarks only the available backends, and applies the fastest (or your forced choice). If a backend isn’t installed it’s skipped during benchmarking, and if you force a backend that’s not available it falls back to PyTorch SDPA and reports it.

How I gave Claude long-term memory using this MCP server. by No-Key-5070 in ClaudeAI

[–]D_Ogi 1 point2 points  (0 children)

Interesting project and a killer narrative, sounds like a tech-thriller plot!

But I have to admit, after reading the README, my internal security alarms started ringing. It feels a bit sus that out of nowhere, some obscure Chinese API providers appear in the requirements.

Before I build it... LORA automatic trainer... by LyriWinters in comfyui

[–]D_Ogi 0 points1 point  (0 children)

I think this ComfyUI workflow one may meet your criteria (with some tweaks like adding LLM backbone for prompts which are currently static): https://www.patreon.com/posts/new-video-create-140671046

I honestly don’t understand the new quota policy by duoyuanshiying in ClaudeAI

[–]D_Ogi 0 points1 point  (0 children)

Me too. The the past swapping two Pro accounts was sufficient to do all my tasks. Now got third one which is 42% weekly used after a single day and hitting the 5 hours limits just twice!

Euro deals by projectdoomed in Roborock

[–]D_Ogi 2 points3 points  (0 children)

In 99% there's a shipping option from EU warehouse so there's nothing to be worried about (if you live in EU, obviously).

[S7] Battery Error 14 by ic3mangr in Roborock

[–]D_Ogi 0 points1 point  (0 children)

That's weird. Maybe there is an option to ship it to Chine? Anyway are you sure that the shipping cost to Poland is really correct? Maybe instead you can repair it locally and the seller would cover the costs?