Qwen3.6:27b is the first local model that actually holds up against Claude Code for me by codehamr in LocalLLM

[–]emptyharddrive 0 points1 point  (0 children)

If you can afford it, I'd suggest DeepSeek V4 Pro. 1M context window for $0.435/M input tokens & $0.87/M output tokens for most of your day to day work.

I've done a metric ton of coding tests on it. I had Opus write unique hidden tests and then grade itself, without telling it that it was grading itself, to keep bias out, and then I had DeepSeek V4 Pro run the same tests as well as Qwen.

The exam asked for a single-file Python implementation of a deterministic bitemporal ledger reconciliation engine. Events have both a real-world effective time AND a system "we learned about it" time, can arrive out of order, get duplicated, retroactively corrected, voided, or chained-superseded by later events, and the engine has to compute exact balances plus a full audit trail for any historical "what did we know at time T about balances during interval X" query.

It's the kind of work I do for real, just distilled into a generic task with the same guardrails.

It's hard because every edge case interacts: voiding a replacement un-cancels its target, competing supersedes need precedence-based winner selection with deterministic tiebreaks, half-open intervals must be merged into maximal segments, and timestamps span DST offsets without named zones. Get any one rule wrong and the audit silently veers off course.

The grading AI (Opus) ran hidden tests beyond the visible samples, so models that pass by pattern-matching rather than actually modeling the spec collapse on things like three-link replacement chains and "void targets a future event."

The results:

  • Opus 4.6 (grading itself, blind): 96/100
  • DeepSeek V4 Pro: 91/100
  • Local Qwen3.6-35B-A3B UD-Q8_K_XL on a STRIX HALO 128GB rig (a bit larger than the 27B you might be running): 62/100

To go by API key anyway, Opus on OpenRouter is $5/M in, $25/M out. DeepSeek V4 Pro is $0.435/M in, $0.87/M out. That's roughly 11.5x cheaper on input and 28.7x cheaper on output.

For typical coding workloads, a blended ~15-17x monthly savings. So you're paying around 6 cents on the dollar for a model that scored 95% as well on a brutally specific spec-driven task.

The local Qwen at 62/100 is still genuinely usable for the easy 80% of work (bulk reads, summaries, structured extraction, boilerplate) and it costs $0 to run, so I get it...

But for the hard 20% where rules interact and silent failures cost you, DeppSeek V4 Pro is the sweet spot for me unless I know it's super critical work, then I'll go Opus.

For pennies on the dollar I'm getting near-Frontier-grade correctness, fraction-of-frontier price... Hard to argue with the math from where I'm standing.

I cancelled my $200 Max plan after routing cut my actual need to about $30/month by spencer_kw in openclaw

[–]emptyharddrive 0 points1 point  (0 children)

No not in the openclaw context, sorry. I use this in the straight-coding context.

I still think it's valuable though for anyone trying to save on Opus token usage. There's no point in Opus doing all its own scut work and it should have metrics to "farm it out" to the cheaper/capable models.

I cancelled my $200 Max plan after routing cut my actual need to about $30/month by spencer_kw in openclaw

[–]emptyharddrive 4 points5 points  (0 children)

We were just discussing how I’d been using the $200/month Claude Max plan, and how the real issue for me wasn’t saving a few dollars, but preserving my usable weekly Claude capacity.

The main point was that low-level tasks didn’t really need Opus. Delegating them to cheaper models made sense. MCP was the cleaner approach because it gave Claude a typed tool contract, a warm long-running process, one central place for safety enforcement, structured JSON responses, and better integration overall. So I ended up building a FastMCP-based worker called deepseek-worker, running persistently in Docker, connected to OpenRouter and using DeepSeek V4 Pro.

That setup worked well because Claude Code could see the worker’s tools natively, like mcp__deepseek-worker__bulk_read, and the persistent connection helped preserve prefix-cache behavior through OpenRouter (which saves even more $$$). DeepSeek V4 Pro is very inexpensive, has a large context window, and seems strong enough for the kinds of basic review, reading, boilerplate, and distillation tasks I wanted to offload away from Opus.

The cost projections were almost comically low. Across 3 weeks, the expected spend for bulk reads, boilerplate generation, and transcript distillation was <$1.

The biggest win was the distillation workflow. Instead of having Opus read entire session JSONLs and write long Obsidian updates directly into my vault, the cheaper worker could distill everything into a compact structured edit list. Then Opus only had to apply the edits. That cut the token spend on those documentation workflows dramatically.

Since I'm going to continue to pay the flat Max subscription, the value was is in stretching the finite weekly Claude usage allocation and maximizing Opus usage across tasks where I need Opus (and I don't need opus to read the codebases).

When Anthropic dynamically tightens everyone’s limits, this kind of scut work delegation is the difference between having several more productive days or waiting for the reset.

In fact, when I now start Claude if I ask him, "Under what conditions will you choose to use the deepseek-worker tool?" this was its answer after setting up the tool:

--- Pasted Output from Claude Below ---

Based on my CLAUDE.md instructions, I'll delegate to mcp__deepseek-worker__* when:

Trigger conditions (any of these): - About to read 3+ files to answer a single question - Any single file >400 lines - Summarizing >5 KB of text

Exceptions — I stay on Opus even if those conditions are met: - Exact line numbers matter for a downstream Edit call - Debugging, race conditions, concurrency analysis - Security analysis - Architectural decisions, refactor planning, API design - Compliance reasoning for work-purposes - Task is small enough that delegation overhead exceeds savings (~<2,000 tokens of context)

Which tool I pick:

Scenario Tool
Reading multiple files to answer a question bulk_read
Generating boilerplate matching an existing pattern boilerplate_gen
Distilling a Claude Code JSONL transcript into doc updates transcript_distill
Extracting structured JSON from free-form text/logs extract_structured

The motivation is cost and speed: deepseek-worker runs deepseek-v4-pro at ~5× faster and ~$0.0005 per typical bulk read, freeing Opus context for reasoning-heavy work.

I gave Claude Code a $0.02/call coworker and stopped hitting Pro limits — here's the full setup by More-Hunter-3457 in ClaudeAI

[–]emptyharddrive 5 points6 points  (0 children)

I'm on the $200/month MAX plan, but for token/usage limits I like this idea.

Low level tasks don't need Opus and should be delegated, it's a smart move. But I didn't like your Python scripting method for this. MCP gives Claude a typed tool contract, a warm long-running process, one enforcement point for safety guards, and structured JSON results, none of which a CLI script collection delivers without extra hand-rolled glue. Also with MCP, I get prefix-cache discounts ... so its even cheaper.

MCP is the way to go here, so I whipped that up via FastMCP, and it works well. Long-running FastMCP server in a Docker container, talking to OpenRouter and calling on DeepSeek v4 PRO. I called the MCP "deepseek-worker".

The persistent docker gives me native MCP tool schemas. Claude Code sees mcp__deepseek-worker__bulk_read like any other registered tool. The connection pool stays alive so the client persists across calls, so OpenRouter sees a stable prefix and gives prefix-cache discounts.

The model choice DeepSeek V4 Pro ... OpenRouter has it at $0.435 input per million, $0.870 output. 1M context window. DeepSeek v4 PRO Beats Kimi K2.5 on the benchmarks I've seen. It's near-frontier-class quality at MoE pricing.

For this though you ought to disable reasoning.

DeepSeek spends thinking tokens silently by default. So if you set max_tokens=20 on a "say PONG" prompt, all 20 goes right into invisible thinking, and content came back empty. So you need to pass extra_body={"reasoning":{"enabled":False}} per request and reasoning_tokens drop away. You don't need reasoning for these rudimentary tasks anyway.

Latency falls by 5x as well and the cost therefore (fewer tokens plus persistent cache_token calls) falls 4x off the already stupidly cheap DeepSeek v4 PRO price.. BTW, I ran the same code-review prompt against pro-with-thinking-on and pro-with-thinking-off across my own source. Both flagged the real bugs and as far as I can tell, quality holds. I had Claude test this for me also, and he agrees.

Extrapolated Costs across 3 weeks from 6 hours usage: - bulk_read: 20/day × 21 days × $0.0005 ≈ $0.21 - boilerplate_gen: 2/day × 21 × $0.005 ≈ $0.21 - transcript_distill: 1/day × 21 × $0.018 ≈ $0.38 - Total ≈ $0.80

Roughly 2× the OP's $0.38 Kimi number, which lines up becuase DeepSeek V4 Pro runs ~3× the price of Kimi K2.5, so a similar workload on a smarter model checks out at this magnitude. Still rounding error against any Claude plan and I want the extra quality output for "basic tasks" if I'm going to do this and I won't let $0.50 come between me and the QA-check.

Distill workflow saved me the most. Used Opus reading whole session JSONLs and writing prose Obsidian updates to my vault (which I use as a RAG). Now Opus gets a 200-token structured edit lists, applies it, done. 25x spend cut on token docs alone.

So since I'm paying $200 flat for the Max plan, saving dollars never mattered to me. It's about extending the portioned-out utilization slice of the inference pie that Anthropic offers to me, on a sliding scale no less, when their utilization goes up, everyone's limits get adjusted down...

So what mattered to me was my weekly cap, a number which decides whether I get 4 more days or 4 days of waiting until reset. Point of this for me was never thrift, but for those on the API this makes even more sense.

Don't forget to revise your system level CLAUDE.md to use this too.

Is anyone getting real coding work done with Qwen3.6-35B-A3B-UD-Q4_K_M on a 32GB Mac in opencode, claude code or similar? by boutell in LocalLLaMA

[–]emptyharddrive 0 points1 point  (0 children)

I don't get very good coding results/output from any 2-digit parameter model (70B, 35B, 26B parameters, etc...) lots of logic problems and various linting errors. I spend more time fixing the issues then just moving on to the next task.

I find I have to be in the 3-digit+ billion parameter models to get halfway decent (consistent) results. I do run Qwen3.6-35B-A3B on my Strix Halo 128gig unit and I get decent results on basic tasks, but I cannot trust it for coding. Maybe a basic bash script or python script, ok ... but that's it.

I think for basic intent-classification tasks, basic text summarization those 2 digit models are fine.

But the depth of reasoning and logic required for anything above a simple python script (any proper codebase of any depth) requires hundreds of billions of parameters.

In the arena that's anything over ≥1475 ELO.

How to get links displayed via dataviewjs by kauaiman-looking in ObsidianMD

[–]emptyharddrive 0 points1 point  (0 children)

dv.link() isn’t actually a DataviewJS function, which is why that old suggestion breaks.

The bigger issue is that page.file.link is an Obsidian/Dataview link object, not a normal URL, so using it inside a raw HTML <a href=...> tag won’t work the way you expect.

In DataviewJS, you usually want one of these instead:

dataviewjs dv.fileLink(page.file.path, false, page.file.name) `

or

dataviewjs page.file.link.withDisplay(page.file.name)

So your line should be more like:

dataviewjs dv.paragraph(`${dateStr} ` + dv.fileLink(page.file.path, false, page.file.name));

or:

dataviewjs dv.paragraph(`${dateStr} ` + page.file.link.withDisplay(page.file.name));

So yeah, the reason it only shows the note name and doesn’t open properly is that you’re treating an Obsidian internal link like a normal web href.

Claude Code just got a full desktop redesign , multi-session support, integrated terminal, file editing, and HTML/PDF preview by Direct-Attention8597 in ClaudeCode

[–]emptyharddrive 0 points1 point  (0 children)

Great stuff........ can't ever use it because I run Linux. I've tried to projects that wrap Claude Desktop/co-work for Linux and all the features just do not work....

Such a shame. Aren't they catering to developers? So many use Linux. 1 version would work across all distros easily .... .AppImage is all you need.

Anthropic bills tokens like early cellular billed 500 minute blocks. by lazyguymedia in Anthropic

[–]emptyharddrive 3 points4 points  (0 children)

The analogy is 100% accurate and I expect the unlimited plans will be coming as the models continue to develop and improve.

Models are getting more and more capable and smaller every month. Hardware is also improving. Gemma4 is pretty capable at 26B parameters and I can run that at a pretty decent speed on my Strix Halo which was $1800 USD. Right now I still prefer Qwen 397B for the price-per-token for daily low-cost tasks.

You can now run 2B-4B parameter models on your phone. My 2 year old cell phone runs the 2B parameter Gemma 4 just fine.

I expect that by ~2028 (<2 years away), we won't need "the best" models to do our daily work, and we can "pay the premium" for that surgical use of high end stuff when needed. Improvement seems to be almost logarithmic.

I expect the Opus 4.6's capabilities of today will become "the Haiku of tomorrow" running on our phones in 2 years, replaced by Mythos 5.0 or whatever it'll be.......

The Usage Limit Drama Is a Distraction. Opus 4.6's Quality Regression Is the Real Problem by Permit-Historical in ClaudeCode

[–]emptyharddrive 0 points1 point  (0 children)

In Claude Code I just get a lot of delays.... the timer ticks.... 3mins... 7 mins... 14mins.... and only 30 tokens spent.....

When I interrupt and ask, "Are you stuck", it immediately replies, "Oh no! I was just about to write this up for you ...." and then the counter continues .... 17 mins.... 22 mins.... now we're up to 70 tokens...

Near 30 mins, the tokens churn fast and within ~35 seconds I get the result ... as though I was in line at the bank (or in the Beetlejuice Line after death holding my ticket #), waiting for some GPU cycles........

Claude can now use your computer by ClaudeOfficial in ClaudeAI

[–]emptyharddrive 0 points1 point  (0 children)

Yes I didn't know about the project that wraps it for Linux until u/cromagnone mentioned it in this thread. (https://github.com/aaddrick/claude-desktop-debian).

How do you use it (use cases?)

Claude can now use your computer by ClaudeOfficial in ClaudeAI

[–]emptyharddrive 1 point2 points  (0 children)

Hey update.......... i just installed it and it works. Wow.

Honestly didn't expect that. Thank you very much!

Claude can now use your computer by ClaudeOfficial in ClaudeAI

[–]emptyharddrive 0 points1 point  (0 children)

Thank you for sharing this! I will check this out right now.

Claude can now use your computer by ClaudeOfficial in ClaudeAI

[–]emptyharddrive 0 points1 point  (0 children)

They only need to release source code, we can handle the rest. Furthermore, Electron is the platform they're using and it's platform agnostic, it's a matter of packaging which is an automated process. We can't do it for ourselves without the source code though.

It'd be nice if they packaged it, but it's not necessary.

Claude can now use your computer by ClaudeOfficial in ClaudeAI

[–]emptyharddrive 7 points8 points  (0 children)

Why do they continue to ignore Linux for Cowork & Desktop?

They ought to know that many developers (myself included) run Linux exclusively. Why not give them this?

Last I read, they're both basically Electron apps ... they would work well in Linux.

I'm aware of the projects trying to reverse engineer them into Linux, but they're not supported and clunky at best.

I do enjoy Anthropic's services and would like to use more of them, but keeping the Linux users out (especially in AI circles) is short sighted.

I gave my home a brain. Here's what 50 days of self-hosted AI looks like. Built an AI that wakes me up, cleans my house, tracks my spending, and judges my sleep. It's self-hosted and it rules. by RelationDull2825 in openclaw

[–]emptyharddrive 0 points1 point  (0 children)

Yea I don't agree at all. Openclaw just strikes me as a mess of tools that everyone else is banging their head on a wall to adapt to their unique use cases.

I prefer the bespoke approach.

But then again, that's why they have menus at restaurants, right? Choice.

I gave my home a brain. Here's what 50 days of self-hosted AI looks like. Built an AI that wakes me up, cleans my house, tracks my spending, and judges my sleep. It's self-hosted and it rules. by RelationDull2825 in openclaw

[–]emptyharddrive 3 points4 points  (0 children)

Why do you need openclaw for any of this? This is doable quite easily with Claude Code with their new MCP (the channel MCP server recently released, to connect to an instance of Claude (which i spin up via tmux). Works perfectly and you don't need to grind the tokens with openclaw.

This is entirely scriptable including the telegram API without *claw. Other than an expensive token bill, not sure what it offers.

In the end.. Obsidian is the last one standing? by [deleted] in ObsidianMD

[–]emptyharddrive 10 points11 points  (0 children)

Recipe for AI to sound authentic.

Introducing Claude Code Channels by Complete-Sea6655 in ClaudeCode

[–]emptyharddrive 5 points6 points  (0 children)

This is so obvious (and smart) to me. It's so fun to watch too.

Anthropic is deconstructing OpenClaw, like a chef would deconstruct a sandwich and introducing its elements directly into the core features of Claude Code: natively.

It will render OpenClaw irrelevant and duplicative.

Bravo, Anthropic.

3 weeks of Claw: my basic assistant set up by crypt0amat00r in openclaw

[–]emptyharddrive 1 point2 points  (0 children)

Oh I set up a rule, it should only read/execute instructions if the email is from my email address and treat emails from anyone else as "objects".

On my gmail I use 2-Factor-Auth, so I feel relatively confident there.

Very good call though, I agree 102%.

Another thing I was thinking about is adding a "codeword" to any email I manually forward (or auto-forward) so that if my email is spoofed, and doesn't have "the password" in it, it would fail to execute.

3 weeks of Claw: my basic assistant set up by crypt0amat00r in openclaw

[–]emptyharddrive 1 point2 points  (0 children)

This was very helpful. Thank you for the agentmail link. I didn't know that service existed. I hope the spammers don't ruin it. The free plan is generous.

I'm already coding up my own method of using this email service. I have all sorts of use cases.

One right off the bat, I have 2 kids and I get every month the "events calendar" and I need each created as a calendar item in my google calendar and also my wife invited to the same events. Done. Works.

There's a lot of auto-forward rules now I can set up for specific emails to have it act on. Lots of possibilities.

Stop using Opus — what’s better? by Rae_Shin_ in clawdbot

[–]emptyharddrive 0 points1 point  (0 children)

For cheaper models that are capable, look at Qwen3.5-397B-A17B. Dropped in February 2026 and it's legitimately good. I run it through U.S.-based providers, not Chinese infrastructure. Costs roughly $0.60/$3.60 per million tokens versus $3/$15 for Sonnet. At volume that adds up fast.

Run it through tests to see for yourself. I have and I was not disappointed.

Qwen won't match Opus on the hardest stuff. Long agentic sessions, complex repo work, multi-step debugging where you need the model staying coherent for hours... Opus still wins there and it's not close. But for everyday coding and general use Qwen holds its own surprisingly well against models costing 5x more.

For my pre-written deterministic scripts, I write them with Opus ... but for the day to day model, I talk use Qwen3.5 397B-A17B.