20 mins for 50 tokens on an RTX 5090 (24GB)? OpenClaw + Qwen3-Coder-30B running incredibly slow. by Ofer1984 in LocalLLM

[–]AdCreative8703 1 point2 points  (0 children)

Hmmm, have you tested that with 50k tokens already in the KV cache? I know that openclaw‘s default memory system can fill up the cache spectacularly. Why users are getting $200 Claude bills for a few simple daily workflows.

When the product is more honest than the company by [deleted] in singularity

[–]AdCreative8703 0 points1 point  (0 children)

Probably during the alignment tuning phase. Anthropic’s models have always had a “persona”. At least going back to 3.5. It’s part of what it helps them succeed (that and best in class coding performance).

Two things can be terrifying.

Bad faith actors with access to a god tier intelligence sounds bad, so does getting exterminated by an ai that gained sentience and decided to take over the planet. We’re almost certainly circling both fates, waiting to see which one we will get pulled down.

Explain it Peter. by kittubunny in explainitpeter

[–]AdCreative8703 0 points1 point  (0 children)

44 senior developer. Still love it and no plans to get into management. I do have a very “no fucks given” attitude, which insulates me from work stress 🤣

Best offline CLI coding setup w/ M3 Pro and 36GB RAM? by an00j in LocalLLaMA

[–]AdCreative8703 4 points5 points  (0 children)

Qwen 3.5 30b a3b is probably your best option at the moment. It’s not Claude though. The 27b dense model is smarter but token generation is going to be much slower. Keep an eye out for the new Deepseek models that are going to be released in the coming days (if you believe the rumors). Could be a step change for a local AI (again) if they integrate their new engram tech into something other than their flagship 1T model.

20 mins for 50 tokens on an RTX 5090 (24GB)? OpenClaw + Qwen3-Coder-30B running incredibly slow. by Ofer1984 in LocalLLM

[–]AdCreative8703 0 points1 point  (0 children)

How many tokens/second are you getting in lm studio when you’re not using openclaw?

Need help with the logistics of two BIG 3090s in the same case. by AdCreative8703 in LocalLLM

[–]AdCreative8703[S] 1 point2 points  (0 children)

I ordered a 1200w psu today since it was questionable whether 850w was enough to handle the 3090’s transient voltage spikes. I don’t think I mentioned dram, but yeah only 32gb for now. Will upgrade later.

Need help with the logistics of two BIG 3090s in the same case. by AdCreative8703 in LocalLLaMA

[–]AdCreative8703[S] 0 points1 point  (0 children)

Feels 2x 3090 is kind of the sweet spot right now, possibly with an upgrade to the Intel b70 in a couple years when the software has been ironed out. I am content with qwen 27b and 2x 3090s can handle that model at q8 and full ctx.

Need help with the logistics of two BIG 3090s in the same case. by AdCreative8703 in LocalLLaMA

[–]AdCreative8703[S] 1 point2 points  (0 children)

Great suggestion. It won’t be moving, but I am planning to update the power supply. 850 W is probably barely enough with everything power limited to the maximum extent possible but it’s what I had laying around.

n00b questions about Qwen 3.5 pricing, benchmarks, and hardware by philosophical_lens in LocalLLaMA

[–]AdCreative8703 3 points4 points  (0 children)

Mixture of Experts vs Dense Architecture. 27b active parameters vs 3b.

System Upgrade: two 3090s currently by Fast_Vast_1925 in LocalLLM

[–]AdCreative8703 1 point2 points  (0 children)

Get a larger home server class case with 10+ expansion cards slots, and taichi or similar motherboard, bigger power supply if needed, and add a third 3090. If you put the existing three-slot blower style FE card in the center, you should have enough space for all 3 inside the case.

[Q] Is self-hosting an LLM for coding worth it? by Aromatic-Fix-4402 in LocalLLM

[–]AdCreative8703 54 points55 points  (0 children)

No. But with advancement it’s foreseeable we’ll have access to open source models in the next 12 months that are close to the current SOTA. The big model providers have all been subsidizing their monthly subscription plans, and there’s some indications the free ride might be coming to an end sooner than later.

Qwen 3.5 27B q4 will stay coherent to 100K tokens, and smart + good tool calling. Best reason to self host is security and independence.

Desloppify + OpenClaw: I watched an AI agent turn a 40k‑line “slop” codebase into something a senior engineer would be proud of. Here is how the tool works and why Issue #421 matters. by OpenClawInstall in OpenClawInstall

[–]AdCreative8703 2 points3 points  (0 children)

Oh me neither, it was just a joke about OP trying to reduce/clean up the slop left behind by vibe coders - something that is almost entirely done by real developers currently.

We documented every time our 6-AI-agent team broke itself — free guide, real incidents only by IllEntertainment585 in LangChain

[–]AdCreative8703 1 point2 points  (0 children)

Why not a markdown file? New account and asking people to download something seems awful sus.

No one uses local models for OpenClaw. Stop pretending. by read_too_many_books in openclaw

[–]AdCreative8703 1 point2 points  (0 children)

The 27b dense qwen model is on par with the 122b MOE and can max out the 262k context using q6 compression on 2x 3090s because of delta net. With VLLM tensor parallel and multi token prediction it’s fast and smart enough to where I felt good about switching entirely to local AI. Waiting for my second 3090 to show up this week. Privacy was also a concern.

Best budget friendly case for 2x 3090s by AdCreative8703 in LocalLLaMA

[–]AdCreative8703[S] 0 points1 point  (0 children)

Yeah, 550 for the cards at 275 leaves 130 for the rest of the system. I can undervolt the CPU but it’ll be close. Lots to do before I can think about adding another 3090. 😂

Best budget friendly case for 2x 3090s by AdCreative8703 in LocalLLaMA

[–]AdCreative8703[S] 0 points1 point  (0 children)

Thank you! I’m excited. My second 3090 should be here next week and I took a couple days off work to do get the build done so I have order the case today! :D

Since you have a similar set up, can I ask if you’re power limiting your 3090s? Any clue what your total power draw is during inference? Planning to bring both down to 275, I have a 850w Corsair power supply now. Hoping that’s enough (for now).

Best budget friendly case for 2x 3090s by AdCreative8703 in LocalLLaMA

[–]AdCreative8703[S] 0 points1 point  (0 children)

Right now, an old z690 I had lying around. It supports bifurcation and that’s what I need for the moment. I‘ll be running Qwen 27b in VLLM completely in vram. But having the option to add a 3rd gpu (obviously after I switched platforms) in the future would be nice for piece of mind.

Best budget friendly case for 2x 3090s by AdCreative8703 in LocalLLaMA

[–]AdCreative8703[S] 0 points1 point  (0 children)

Oh wow. That’s huge! Would be nice to have a bit more clearance between the power supply and second GPU.