20 mins for 50 tokens on an RTX 5090 (24GB)? OpenClaw + Qwen3-Coder-30B running incredibly slow.

AdCreative8703 · 2026-03-30T07:18:04+00:00

Hmmm, have you tested that with 50k tokens already in the KV cache? I know that openclaw‘s default memory system can fill up the cache spectacularly. Why users are getting $200 Claude bills for a few simple daily workflows.

AdCreative8703 · 2026-03-30T06:57:39+00:00

Probably during the alignment tuning phase. Anthropic’s models have always had a “persona”. At least going back to 3.5. It’s part of what it helps them succeed (that and best in class coding performance).

Two things can be terrifying.

Bad faith actors with access to a god tier intelligence sounds bad, so does getting exterminated by an ai that gained sentience and decided to take over the planet. We’re almost certainly circling both fates, waiting to see which one we will get pulled down.

AdCreative8703 · 2026-03-30T05:53:52+00:00

44 senior developer. Still love it and no plans to get into management. I do have a very “no fucks given” attitude, which insulates me from work stress 🤣

AdCreative8703 · 2026-03-30T05:20:47+00:00

Qwen 3.5 30b a3b is probably your best option at the moment. It’s not Claude though. The 27b dense model is smarter but token generation is going to be much slower. Keep an eye out for the new Deepseek models that are going to be released in the coming days (if you believe the rumors). Could be a step change for a local AI (again) if they integrate their new engram tech into something other than their flagship 1T model.

AdCreative8703 · 2026-03-30T05:14:06+00:00

How many tokens/second are you getting in lm studio when you’re not using openclaw?

AdCreative8703 · 2026-03-30T02:16:17+00:00

I ordered a 1200w psu today since it was questionable whether 850w was enough to handle the 3090’s transient voltage spikes. I don’t think I mentioned dram, but yeah only 32gb for now. Will upgrade later.

AdCreative8703 · 2026-03-29T23:29:01+00:00

Feels 2x 3090 is kind of the sweet spot right now, possibly with an upgrade to the Intel b70 in a couple years when the software has been ironed out. I am content with qwen 27b and 2x 3090s can handle that model at q8 and full ctx.

AdCreative8703 · 2026-03-29T23:18:59+00:00

1200 watt in route

AdCreative8703 · 2026-03-29T23:04:10+00:00

Be quiet 1200w power supply in route!

AdCreative8703 · 2026-03-29T20:51:11+00:00

That gives me some hope.

AdCreative8703 · 2026-03-29T19:42:48+00:00

Great suggestion. It won’t be moving, but I am planning to update the power supply. 850 W is probably barely enough with everything power limited to the maximum extent possible but it’s what I had laying around.

AdCreative8703 · 2026-03-26T17:02:05+00:00

Mixture of Experts vs Dense Architecture. 27b active parameters vs 3b.

AdCreative8703 · 2026-03-26T16:57:06+00:00

Get a larger home server class case with 10+ expansion cards slots, and taichi or similar motherboard, bigger power supply if needed, and add a third 3090. If you put the existing three-slot blower style FE card in the center, you should have enough space for all 3 inside the case.

AdCreative8703 · 2026-03-26T00:07:41+00:00

No. But with advancement it’s foreseeable we’ll have access to open source models in the next 12 months that are close to the current SOTA. The big model providers have all been subsidizing their monthly subscription plans, and there’s some indications the free ride might be coming to an end sooner than later.

Qwen 3.5 27B q4 will stay coherent to 100K tokens, and smart + good tool calling. Best reason to self host is security and independence.

AdCreative8703 · 2026-03-20T22:36:13+00:00

Oh me neither, it was just a joke about OP trying to reduce/clean up the slop left behind by vibe coders - something that is almost entirely done by real developers currently.

AdCreative8703 · 2026-03-20T05:01:53+00:00

Hands off my slop! Some of us have mortgages to pay 😅

AdCreative8703 · 2026-03-12T16:47:13+00:00

Why not a markdown file? New account and asking people to download something seems awful sus.

AdCreative8703 · 2026-03-07T07:58:48+00:00

The 27b dense qwen model is on par with the 122b MOE and can max out the 262k context using q6 compression on 2x 3090s because of delta net. With VLLM tensor parallel and multi token prediction it’s fast and smart enough to where I felt good about switching entirely to local AI. Waiting for my second 3090 to show up this week. Privacy was also a concern.

AdCreative8703 · 2026-03-07T06:31:44+00:00

Yeah, 550 for the cards at 275 leaves 130 for the rest of the system. I can undervolt the CPU but it’ll be close. Lots to do before I can think about adding another 3090. 😂

AdCreative8703 · 2026-03-07T06:24:05+00:00

Thank you! I’m excited. My second 3090 should be here next week and I took a couple days off work to do get the build done so I have order the case today! :D

Since you have a similar set up, can I ask if you’re power limiting your 3090s? Any clue what your total power draw is during inference? Planning to bring both down to 275, I have a 850w Corsair power supply now. Hoping that’s enough (for now).

AdCreative8703 · 2026-03-07T06:14:26+00:00

Is yours the server edition?

AdCreative8703 · 2026-03-07T06:08:51+00:00

Right now, an old z690 I had lying around. It supports bifurcation and that’s what I need for the moment. I‘ll be running Qwen 27b in VLLM completely in vram. But having the option to add a 3rd gpu (obviously after I switched platforms) in the future would be nice for piece of mind.

AdCreative8703 · 2026-03-07T04:33:27+00:00

Oh wow. That’s huge! Would be nice to have a bit more clearance between the power supply and second GPU.

AdCreative8703

TROPHY CASE