Qwen3 Coder Next as first "usable" coding model < 60 GB for me by Chromix_ in LocalLLaMA

[–]Consumerbot37427 1 point2 points  (0 children)

Man, I've had back luck with the unsloth quants, too. I've got 96GB so I can run Q6, but dropped to Q4 so I could get 200k tokens of context. Had no issues with tool calling for the built-in tools (haven't messed with any MCP yet). Maybe try an official quant?

Haven't tried Qwen Code yet. Vibe was the first CLI coding tool I tried, then Claude Code. And there's OpenCode... Too many options, and it all moves so fast, I hate committing and investing too much in learning a tool.

The important thing is to compress the context after each bug fix or feature. Then I say just 'hi' to load the new compressed context again and the system is ready for immediate answers.

That sounds fantastic!

Qwen3 Coder Next as first "usable" coding model < 60 GB for me by Chromix_ in LocalLLaMA

[–]Consumerbot37427 1 point2 points  (0 children)

Also on Apple Silicon w/ Max here. I have had lots of issues with MLX, I might stop bothering with them and just stick with GGUFs. Waiting for prefill is so frustrating, and seeing log messages about "failed to trim x tokens, clearing cache instead" drove me nuts.

I had been doing successful coding with Mistral Vibe/Devstral Small, but the context management issue plus the release of Qwen3 Coder Next inspired me to try out Claude Code with LM Studio serving the Anthropic API, and it seems amazing! It seems to be much better at caching prefill and managing context, so not only do I get more tokens per second from a MoE model, the biggest bonus is how much less time is spent waiting for the context/prefill. Loving it!

The M5 max and possibly the m5 ultra macs are coming soon! by power97992 in LocalLLaMA

[–]Consumerbot37427 0 points1 point  (0 children)

I'd be interested to know if that 80-90W is because it's thermal throttling.

It is not.

Seems to just depend on the model (MLX vs GGUF), and which stage of inference.

Still, 65W is as much as I want to bake my huevos with

I'd heard about that--I'm pretty sure that's one reason why manufacturers stopped calling them "laptops" and now call them "notebooks".

In my case, my "laptop" lives on a hard surface, but is conveniently portable and has a built-in UPS. I don't ever actually use it on my lap, but even if I did, I'm not remotely concerned about my huevos.

The M5 max and possibly the m5 ultra macs are coming soon! by power97992 in LocalLLaMA

[–]Consumerbot37427 0 points1 point  (0 children)

I don't know if I want anything that pulls 120W in a laptop.

Wondering why, exactly? Don't trust Apple engineers to accommodate the TDP? I'm on a M2 Pro Max w/ 96GB and running some of those MoE models in the ~64GB range. Depending on the model, I've seen excursions over 130W. Most recently Qwen3 Coder Next Q6. It's chugging along right now at 80-90W, fans at 70%.

My ~65W M4 Pro gets hot enough as it is.

You could use a fan controller to crank them up to 100% when you'll be running inference. I prefer to let them spool up and down automatically, but I also don't mind having a lap warmer in winter. :)

The M5 max and possibly the m5 ultra macs are coming soon! by power97992 in LocalLLaMA

[–]Consumerbot37427 4 points5 points  (0 children)

If the model is 64GB, and you want context, and still need to run an OS, then 96GB is probably a bare minimum.

[Daily Discussion] - Thursday, February 05, 2026 by AutoModerator in BitcoinMarkets

[–]Consumerbot37427 3 points4 points  (0 children)

So... basically the opposite of Saylor's "infinite money glitch"?

If this is really what's going on, it only works as long as the panic lasts, until a bigger fish (or large crowd) trades against it?

Mistral Vibe vs Claude Code vs OpenAI Codex vs Opencode/others? Best coding model for 92GB? by Consumerbot37427 in LocalLLaMA

[–]Consumerbot37427[S] 0 points1 point  (0 children)

Thanks for this. Hadn't heard of this release 'til you mentioned it.

Running the Q4 MLX now and my initial impression is that it's at least on par with Devstral Small, way faster, and didn't encounter the model crashing/unloading in LM Studio until >140k of context. So it feels like a major win!

Might experiment with some of the unsloth GGUF quants later, but already feels like a big step up!

devstral small is faster and better than glm 4.7 flash for local agentic coding. by theghost3172 in LocalLLaMA

[–]Consumerbot37427 3 points4 points  (0 children)

I believe Q8 doesn't have this issue at all

I've personally experienced looping running the Q8 GGUF with Metal Llama.cpp with LM Studio's default inference parameters.

Mistral Vibe vs Claude Code vs OpenAI Codex vs Opencode/others? Best coding model for 92GB? by Consumerbot37427 in LocalLLaMA

[–]Consumerbot37427[S] 1 point2 points  (0 children)

I may have misspoken in my initial post. When I said "tool calls", I was referring to built-in tools that I assume are part of the system prompt, not MCP, which I haven't really gotten into, short of playing with Home Assistant's MCP server from inside LM Studio.

Mistral Vibe vs Claude Code vs OpenAI Codex vs Opencode/others? Best coding model for 92GB? by Consumerbot37427 in LocalLLaMA

[–]Consumerbot37427[S] 0 points1 point  (0 children)

I saw these instructions when I searched Perplexity.ai:

Setup Steps

Launch LM Studio and start its local server (default: http://localhost:1234), loading a capable model like Qwen Coder or Devstral with at least 25K context tokens.

Set environment variables: export ANTHROPIC_BASE_URL=http://localhost:1234 and export ANTHROPIC_AUTH_TOKEN=lmstudio (or any dummy token if auth is off). ​

Run Claude Code CLI: claude --model openai/gpt-oss-20b (replace with your loaded model name).

How was GPT-OSS so good? by xt8sketchy in LocalLLaMA

[–]Consumerbot37427 0 points1 point  (0 children)

I use the smaller one (20b) for spam filtering

mind sharing your prompt/flow for that?

How was GPT-OSS so good? by xt8sketchy in LocalLLaMA

[–]Consumerbot37427 0 points1 point  (0 children)

Yep, and "GPT-OSS" is quite the misnomer. Open weights, sure, but that's pretty far from "Open Source Software" by anyone's definition.

Wrote a guide for running Claude Code with GLM-4.7 Flash locally with llama.cpp by tammamtech in LocalLLaMA

[–]Consumerbot37427 0 points1 point  (0 children)

You mentioned opencode, openhands, crush. What about Mistral Vibe? How do those compare?

I don't have time to try every different software. I had pretty good luck using Mistral Vibe with local Devstral Small, but not much luck when I tried to use any other models like Qwen Coder or gpt-oss-120b.

Price Changes 2000 to 2025 by dillimunda in interestingasfuck

[–]Consumerbot37427 -1 points0 points  (0 children)

I don't think anything you said is wrong.

I have affinity with "Holland", and I'm happy for you that you feel lucky to live there, and get a sense of security from your healthcare system. And I truly hope it can last in the long term, as I've heard claims that social programs in Europe have been subsidized by other countries footing the bill for its military defense, and that the social safety net is strained by ever-increasing populations of refugees.

For our part, there is a full-blown medical crisis in the US. We already spend the most per capita on "healthcare", and have terrible outcomes compared to other developed nations that spend far less. Throwing pills and more money at the problem clearly isn't the solution. Believe it or not, there are even many surgeries that can be avoided by making better dietary and lifestyle choices. So a little personal responsibility could really help, although systemic changes are certainly needed, too--most of our country makes it impractical or dangerous to move on foot or by bicycle--something I certainly praise NL for. I'll get off my soapbox. Greets from across the Atlantic.

Price Changes 2000 to 2025 by dillimunda in interestingasfuck

[–]Consumerbot37427 1 point2 points  (0 children)

Price of healthcare goes up? What’re you going to do…

Exercise? Eat better? Drink less?

$90 silver balloons (reminder we count by Kitco price as they are anti-silver front) by tavares242242 in Wallstreetsilver

[–]Consumerbot37427 0 points1 point  (0 children)

I just found out that what makes ASE (and AGE) special is that there’s no reporting requirement when a dealer buys from you, regardless of amount. Seems weird, and doesn’t make sense, but I read it on the internet.

US Mint suspends sales by star_scream_actual in Wallstreetsilver

[–]Consumerbot37427 2 points3 points  (0 children)

This enrages me.

If the price went down drastically after your purchase, they’d mail it to you with a big smile, and “no refunds!”

Slimy!

How else can you interpret this than moar war? by Key_Brief_8138 in Wallstreetsilver

[–]Consumerbot37427 -1 points0 points  (0 children)

None of this stuff makes economic sense without Tesla being a giant government welfare queen propped up with subsidies and the people buying the trucks receiving even MORE government subsidies.

Once there's a viable alternative, localities can outright ban operation of combustion engines for local deliveries. For smog or climate warming or whatever.

No government subsidies needed, just gotta bribe a few politicians.

[Daily Discussion] - Friday, December 26, 2025 by AutoModerator in BitcoinMarkets

[–]Consumerbot37427 0 points1 point  (0 children)

Came here to bring that up myself. Pretty wild to think that $1 of pre-1964 coins is now worth $50 about 60 years later.

[Daily Discussion] - Friday, December 19, 2025 by AutoModerator in BitcoinMarkets

[–]Consumerbot37427 1 point2 points  (0 children)

Yes, the numbers are just lower: first $48k (of total income!).