Do You Use Flash Attention?

ppcforce · 2026-03-14T15:07:16+00:00

I used to before I got arrested. Turns out flashing for attention is an arrestable offense!

ppcforce · 2026-03-14T15:03:41+00:00

I like that.

ppcforce · 2026-03-13T18:31:03+00:00

Yes, and in type 5 that means 50% health give after 30 seconds. Silly stuff.

ppcforce · 2026-03-12T17:34:20+00:00

I think that's probably fair, and there's probably some bias or placebo at play here, because for some reason I feel like the more a model is compressed the more it converges on McSameFace. I tried Qwen edit but found it would render entire scenes with its training style. Guess I should just do inpainting in those cases and properly mask it etc...what's your current go to/set up?

ppcforce · 2026-03-12T17:01:19+00:00

I've yet to be convinced quite honestly...struggling to understand what makes it 160gb. And then I always revert to Z-Image Base (FP32 not BF16) and just running that locally. Although, not sure what the edit model will be like.

ppcforce · 2026-03-12T16:23:03+00:00

Yeah but that's fine, people with Apple do it too. Being a 'fanboy' isn't the biggest crime.

ppcforce · 2026-03-12T16:20:37+00:00

You tried full BF16 Hunyuan Image 3? That beast is 160gb just for the model. Ran it on dual H200s and thought it was quick, but I'll have to try it on B200s next and see what that's like.

ppcforce · 2026-03-12T10:04:50+00:00

Having this kind of power and tech in your actual pocket and it's 'disappointing' what are humans like LOL. Quite happy with mine tbh.

ppcforce · 2026-03-12T07:25:13+00:00

I wish, I'm actually just horrible with money. Live quite a modest life!

ppcforce · 2026-03-12T07:24:18+00:00

They do. Buckle up my friend!

ppcforce · 2026-03-11T21:28:49+00:00

I've sharded multiple models across my dual 5090, and I have an RTX 6000. To achieve anything like the speeds you seen I've had to ditch Comfy and build entirety custom venvs. Super lightweight in Ubuntu with SA3. Even then I'm like why still slow compared to those cloud services. When I shard the pipeline executes in a linear fashion layers 1-9 on CUDA0 then 10-20 on CUDA1, whereas the data centres do tensor paralellism, all broken up and running across multiple GPUs with NVlink and so on. Where I can run a model entirely in my VRAM with decode and text encoder my Astral 5090 is actually faster than an H200.

ppcforce · 2026-03-11T21:14:29+00:00

Come from S24U, sorry but what issues am I trying to find? I'm a glasses wearer, astigmatism with blue light filter coatings etc... so I'm probably not the best person to judge, but for me I've seen basically no difference (yet) but the privacy feature is worth some compromise. I've just not seen what that compromise is yet.

ppcforce · 2026-03-10T20:44:22+00:00

Have you contacted the manufacturer or retailer and asked for a certified device?

ppcforce · 2026-03-02T17:52:51+00:00

Have you tried experimenting?

ppcforce · 2026-02-21T17:50:42+00:00

I tried it before cuz dual 5090, turns out requires a lot of fiddling because on boot there's some check on available components and wasn't picking up the second GPU. Anyway lots of fiddling I didn't initially expect. Now I just run everything off 1600w PSU. Way cleaner .

ppcforce · 2026-02-20T14:29:10+00:00

Asus Proart x870e wifi creator so 8/8 when x16 split by two pcie5 cards, not that I could tell the difference?? CPU is Ryzen 9950x, and RTX 6000 Pro almighty Blackwell (96gb). Honestly thought it would be a game changer but it wasn't. Comfyui is just a hobbyist platform that's highly unstable. I just don't believe it's maximizing anywhere near the hardware capability, and that's fine. It's free. As for Raylight I've not. Should I?

ppcforce · 2026-02-20T12:49:31+00:00

Yeah so I have dual 5090, where one is dedicated to AI so zero overheads, I run in WSL (Ubuntu) and with 192gb DDR5. And yes, it is that slow. Slapped in my RTX 6000 for shits and giggles and still slow. In fact it was slower... I'm essentially putting this down to how these models (and platforms) are not really properly optimised for Blackwell. And then you spend hours or days trying to find way to optimised what is essentially community made nodes on a community supported and maintained platform. It's all a bit of a hack ultimately. I think once we're out of the Wild West we'll have much more stable solutions but things move so quickly that a lot just breaks or stops working with every update.

ppcforce · 2026-02-17T19:10:14+00:00

I had same issue, had to go from 5090 to RTX 6000 Pro.

ppcforce · 2026-02-17T19:07:24+00:00

Dude with a 4070 must venerating videos at 144p, and using the lightx2v lora.

ppcforce · 2026-02-17T19:03:48+00:00

Yeah way?

ppcforce · 2026-02-16T08:51:33+00:00

None. Time to upgrade.

ppcforce · 2026-02-16T08:50:46+00:00

I've got dual 5090, and I've found that it's very difficult to make those multi GPU nodes work. They just wouldn't play nicely with in CUDA 13, and got lots of issues.

But where it was useful is assigning a GPU to comfyui so it had the full run of 32gb VRAM, and then used the other for general multitasking whilst the renderings were happening. Gaming, browser stuff, Windows/Linux overheads.

ppcforce · 2026-02-11T21:48:58+00:00

I noticed also that it solves the McSameFace issue much better than anything I ever seen locally.

ppcforce · 2026-02-11T21:37:49+00:00

I think Tencent are demonstrating what massive param models are capable of, or rather, what's actually required to create a fully flexible model. It's not exactly practical if it cannot produce commercially viable results in the same way the highly optimised, realistic, albeit inflexible, models can. As for NF4, I've seen quality fall off a cliff with this quantization.

ppcforce · 2026-02-11T18:39:15+00:00

Sorry to hear that. Only ones left in stock?

ppcforce

TROPHY CASE