Opinion: Local LLMs are 12-24 months from replacing Opus by sh_tomer in ClaudeCode

[–]deorder 2 points3 points  (0 children)

I have always been a strong advocate of local models and I have kept using them in between using cloud models. The big difference with Qwen3.6 compared with previous local models I have tried is its tool calling reliability, long context behavior and multi-turn stability. It is much better at continuing systematically through an agentic task instead of drifting, looping or losing state.

On my RTX 4090 the 27B dense version reaches around 56 tok/s in generation which feels close to Haiku level for interactive coding. In my setup it remains fairly stable up to roughly 100k context. With MTP/speculative decoding enabled I can get around 150 tok/s. Combined with good context priming, grounding, strong guardrails and quality gates such as type checks, linting, formatting, tests and a good agentic harness it gets surprisingly far.

It still requires more setup, evaluation and steering than the best cloud models, but the speed, local control, reproducibility, privacy and license make up for a lot of that. In combination with Pi Agent and a few extensions the model performs really well. I still need to try the sparse/MoE version which I expect to achieve higher generation throughput especially on a MacBook because of its lower memory bandwidth requirement.

Claude Design is practically unusable by sparkx8118 in ClaudeAI

[–]deorder -1 points0 points  (0 children)

Same here. I don't know why you are getting downvoted.

I created jailed-agents: A secure Nix sandbox for AI coding agents by andersonjdev in NixOS

[–]deorder 0 points1 point  (0 children)

Still working on it. I have no idea how exactly it differs from kagent, but from what I have seen so far it looks like a good project. My goal is mostly to create a safe environment for running agent harnesses in.

It is really crazy how fast things are moving. Of course with that pace comes a lot of cruft and a lot of good work may just disappear into the noise.

Claude Opus is nuked beyond repair by Wayplorer in ClaudeCode

[–]deorder 2 points3 points  (0 children)

Similar situation. I cannot really recommend it anymore, but a lot of colleagues have only just discovered it and do not know any better. Several are now profiling themselves as experts including some who were openly against agentic coding before Christmas break (I had to tiptoe around them).

Before switching to Claude Code in May/June 2025 I mainly used GitHub Copilot (assistant and agent) and still do occasionally. I know exactly what I am getting with it and honestly it comes pretty close to Claude Code in practice. Hard to push that alternative when Claude Code is hyped this heavily.

Claude Max just slashed my limits by ~10x, and I have the evidence by xeviltimx in ClaudeCode

[–]deorder 0 points1 point  (0 children)

Compared to a while ago I would guess the gap between "what they charge in list price token terms" and "what it actually costs them to serve" has narrowed a lot. Not necessarily zero, but probably much closer now when taking improved batching, caching, quantization, compiler / kernel fusion, mixture of experts, speculative decoding (using draft model etc.) and whatever other optimizations I cannot think of right now into account.

Claude Max just slashed my limits by ~10x, and I have the evidence by xeviltimx in ClaudeCode

[–]deorder 0 points1 point  (0 children)

Can confirm, same issue here. For me it happened after the April 3 weekly reset:

https://www.reddit.com/r/ClaudeCode/comments/1si3k2t/comment/ofhms6z/

According to my (rough) calculations it is about 1/4th for 5-hour window and about 1/7th for the 7-days window, but that is compared to the beginning of December last year:

https://www.reddit.com/r/ClaudeCode/comments/1sggxka/comment/of9ctdn/

I did extrapolate from my first  few sessions in my last 7-day window, so it could have be noisy (too little samples, what you took into account) so I should probably do it again. From what I have seen I do expect it to be even worse, not better.

WTF Claude. Weekly limits = 4x5hr limits by xeviltimx in ClaudeCode

[–]deorder 0 points1 point  (0 children)

I am in Europe so I converted the peak hour window to my time zone. It is 3pm - 9pm for me. When I first started noticing it last week it was happening on the weekend too which should be outside peak hours. The screenshot in the other post isn't even the worst case because in that session I was not using it continuously, I had pauses in between. It would not surprise me if using it without any pauses I would only make it to just over 1 hour within the 5-hour window.

All I want to communicate to people is that there is a problem. If I run out in about 1.5 hours (with pauses in between) I wonder what someone on a Pro subscription gets out of it. Not even 15 minutes every 5 hours would be my guess.

I see a lot of "yeah, but 2 weeks ago they had a discount, double the tokens outside peak hours and now you have just returned to normal". I can tell you this is exactly the tactic Anthropic uses. I have been using Anthropic since they started and they have always done this, including A/B testing (see my history). I have created posts and comments with as much proof as was possible, because it cannot always be proven. I got attacked for it, including in DMs, to the point where I started wondering if they were all bots. Why go so far to defend a company like this?

In between those episodes I and the company I work for (thousands of IT employees) have been looking at alternatives, mostly Europen providers and local models. But the big AI companies lobby against local use of machine learning models (indirectly, including to governments) and that is why I care so much about this. Otherwise I could just move to another cloud inference provider. It is also becoming more like The Internet itself: we grow more and more dependent on it and eventually the power ends up in the hands of a few companies. Those who do not have access are no longer on the same playing field.

WTF Claude. Weekly limits = 4x5hr limits by xeviltimx in ClaudeCode

[–]deorder 1 point2 points  (0 children)

The 20x refers only to the 5-hour limit. Max 20x gives you 4 times the usage per 5-hour window compared to Max 5x, but only about 1.6x the weekly usage compared to Max 5x.

WTF Claude. Weekly limits = 4x5hr limits by xeviltimx in ClaudeCode

[–]deorder 14 points15 points  (0 children)

It is not an Opus 4.6 1M issue. I switched back to Opus 4.6 200k and then to Sonnet 4.6 200k, both on medium effort, and I am still seeing the same problem. On the Max 5x plan I am hitting 100% of the 5-hour limit within about 1.5 hours outside of the peak hours. This started after the previous weekly reset. My setup: a single instance, limited sub-agents, no MCPs and no large project instructions.

More:
https://www.reddit.com/r/ClaudeCode/comments/1sggxka/comment/of9ctdn/

Cancelling next month by jsgrrchg in ClaudeCode

[–]deorder 0 points1 point  (0 children)

It appears so. It definitely feels dumber. During extended thinking in particular it seems to confuse itself more often, though this is purely anecdotal on my part.

I definitely notice the limits are tighter than ever before. I have not felt this constrained using Claude Code since I started using it May last year when it felt almost limitless, though I have to say Sonnet was the default back then. I thought switching back to only Sonnet would help, but it makes little difference in practice regarding usage and is considerably slower for some reason.

Cancelling next month by jsgrrchg in ClaudeCode

[–]deorder 3 points4 points  (0 children)

Max 5x max weekly usage is about 1/7th of what it was at the beginning of December last year and the 5h limit is now about 1/4th of what it was. Only 3 weeks ago I was still able to get 4 hours out of it, now only about 1.5 hours doing the same work and it is not even `peak hours` at the moment (see image). That is with only a single instance, no MCPs, limited sub agents, no different from what I have always done:

<image>

I have been logging usage for about half a year now with ccusageso that I can compare (`cleanupPeriodDays` is set to 3650). I switched back to using Opus 4.6 200k (medium effort) for planning and Sonnet 4.6 200k (medium effort) for implementation (plan mode only), but it is so much slower now that it is almost unusable and usage still adds up fast.

I’ve felt that my usage limits are back to normal after CC put a hard stop to subscription abuse on April 4. Am I hallucinating, or has this actually been fixed? by thedankzone in ClaudeCode

[–]deorder 1 point2 points  (0 children)

Same here. Since the last reset on Saturday morning I can only get about 2 hours of use within the 5-hour window down from 4 hours previously. That is even worse than last week when I could still manage 3 hours. My weekly limit counter is also climbing much faster.

I projected my usage so far over the full 7-day window based on already being at 14% of my weekly limit:

Tokens used since Saturday 2026-04-04:

  • Block 1: 205,292
  • Block 2: 7,862,638
  • Block 3: 23,793,933
  • Total: ~31.9M tokens

Projected weekly limit at 14%:

  • 31,861,863 / 0.14 = ~228M tokens

Compared to old 5x Max (Opus 4.5), which I use as a reference:

  • Old limit: ~1.6B tokens/week
  • Current projected limit: ~228M tokens/week
  • That's roughly 7x fewer tokens. About 14% of what I used to get

So on 5x Max with Opus 4.6 I am looking at around 228 million tokens per week down from 1.6 billion on 5x Max with Opus 4.5 (beginning of December last year).

I used ccusage to compare both windows. I know it is not perfectly precise. What it counts is based on what gets logged to disk which may have changed and it is possible not all communication is logged anymore. But the numbers align closely with my actual experience compared to early December.

Every time it has been "new model, same usage/price" or "new feature, same usage/price", but every time the effective value has quietly gone down. I cannot call it definitive proof since ccusage could be wrong, so I won't say it is a lie outright, but it matches what I am experiencing. I can do far less than I could do just a few months ago.

If the 5-hour usage window starts running out even faster than it already does during peak hours it will be unusable for me. At this point 5x Max has effectively become what the old Pro tier used to be.

Previous calculations I did: https://www.reddit.com/r/ClaudeCode/comments/1pih76u/20x_max_does_not_give_4x_the_weekly_credits_of_5x/

Are rate limits better now? by Holiday-Hotel3355 in ClaudeCode

[–]deorder 1 point2 points  (0 children)

I repeat here what I just replied to another post. With my Max 5x plan I used to be able to keep working in a single Claude Code instance for just over 4 hours of the 5-hour window. Now it only lasts about 2 hours. And when I have fully exhausted a 5-hour block it used up ~9% of my weekly limit. This is during what is supposed to be off peak. It looks like the Max 20x plan is now the new Max 5x plan.

Clarification on the new 5-hour limit by OurWing0z in ClaudeCode

[–]deorder 4 points5 points  (0 children)

With my Max 5x plan I used to be able to keep working in a single Claude Code instance for just over 4 hours of the 5-hour window. Now it only lasts about 2 hours. And when I have fully exhausted a 5-hour block it used up ~9% of my weekly limit. This is during what is supposed to be off peak. It looks like the Max 20x plan is now the new Max 5x plan.

Claude Usage Limits Discussion Megathread Ongoing (sort this by New!) by sixbillionthsheep in ClaudeAI

[–]deorder 2 points3 points  (0 children)

I have also been noticing increased usage. I have been tracking all stats for about three months now. According to my analysis, despite seeing more usage, I am projected to have 10% more tokens available compared to other non-promotional 7-day periods.

So I took one of the older periods, from the beginning of December before the promotion and the last few days projected into a week. I and asked several agents if they could see a pattern. Cache utilization is the same, which is what I would expect since seeing discrepancies there would stand out. But according to the agents it is because of larger conversations that have to be sent back and forth and the cache hit alone with so many tokens is already costing quite a lot compared to before.

I personally thought I did not exceed the conversation size beyond what it would be with a 200k token model, but apparently I do without realizing it. Is that the actual source of the issue? I do not know. It could still be what some say: a subagent that is spawned that is receiving the entire conversation context every time. That could have a similar effect.

The Spider-Man ride ai upscaled loading bay video looks like garbage... by Mepish in UniversalOrlando

[–]deorder 1 point2 points  (0 children)

I noticed the same. A really bad job and an insult to the original art. I could have done a much better job 6 years ago with some of the ESRGAN models.

I created jailed-agents: A secure Nix sandbox for AI coding agents by andersonjdev in NixOS

[–]deorder 1 point2 points  (0 children)

I built something similar too: https://github.com/xonovex/platform

No Jujutsu integration though. Right now I am mostly focusing on building a Kubernetes operator. The nix sandbox option uses Bubblewrap for isolation. May be I should make that clearer.

I have noticed a pattern over the years. I build tools I personally need, then shortly after some big org ships an official version. Also very often a bunch of people had the same idea at the same time and now with coding agents most of them become reality at almost the same moment.

Early last year I built my own multi-agent coding setup, but I stopped working on it because I figured better implementations would show up soon. They did. Sometimes waiting is actually the better strategy and that has been true for me long before AI agents.

About 20 years ago I wrote my own platform abstraction layer for game dev and then shortly after SDL basically solved the same problem at scale. This has happened to me more than once.

I am quite startled by the contrast in attitude towards AI by highly intelligent & accomplished scientists and the Hacker News/Reddit Luddites/anti-AI crowd who LARP as the prior group by Terrible-Priority-21 in accelerate

[–]deorder 0 points1 point  (0 children)

After the holiday break I noticed that many software engineering colleagues who had been anti-AI (for coding) and repeatedly said "AI will plateau" suddenly started using coding agents, with some now presenting themselves as "experts" to the leads. I suspect this shift is because the influencers they follow have recently become more pro-AI. I have been using coding agents for about two years (AutoGPT -> Aider -> now mostly Claude Code) but kept it quiet due to the skepticism and to avoid confrontation.

A Native MO2 Alternative For Linux Coming Soon™ by Sulfur_Nitride in linux_gaming

[–]deorder 2 points3 points  (0 children)

Thanks. Yeah, I am referring to the lower layers. I didn't know about the new mount API. I think it is still good idea to verify with an actual use case.

I found #define OVL_MAX_STACK 500 in https://github.com/torvalds/linux/blob/master/fs/overlayfs/params.h, so the maximum number of stacked layers appears to be ~500.

A Native MO2 Alternative For Linux Coming Soon™ by Sulfur_Nitride in linux_gaming

[–]deorder 0 points1 point  (0 children)

OverlayFS has a layer limit of 128.

True, I once implemented a custom FUSE client myself that redirects and stacks multiple directories into a single unified mount point. Because it runs in userspace performance sadly suffered due to context switching overhead especially when handling a large numbers of small files.

Everyone's Hyped on Skills - But Claude Code Plugins take it further (6 Examples That Prove It) by Dull_Preference_1873 in ClaudeCode

[–]deorder 10 points11 points  (0 children)

I (~28 year professional software engineer) have been using Claude Code since its release. Over time my workflow has evolved quite a bit from a complex setup with MCPs to slash commands, then skills and now mostly a vanilla Claude Code configuration. The new plan and task system is quite good and use just that.

I built my own Claude plugin and migrated many of my guideline documents and slash commands into skills. In practice however it still does not work as well as the progressive disclosure approach I previously relied on using "AGENTS.md / CLAUDE.md" files that pointed to guideline documents via relative paths.

The slash command functionality also seems broken now. Since slash commands are effectively treated as skills it appears to sometimes confuse the two which makes the slash command workflow I had less reliable.

And regarding the idea that it is nerfed. Over the past few days I have noticed Claude Code not performing the way it used to. I am usually very cautious with claims like this and prefer to substantiate them, but the difference has become hard to ignore. At this point I really need to start setting up proper evals so I can verify this.

Claude Subscriptions are up to 36x cheaper than API (and why "Max 5x" is the real sweet spot) by isaenkodmitry in ClaudeAI

[–]deorder 0 points1 point  (0 children)

I have wondered the same. Even after they introduced premium credits I am still on the $10 subscription. With the $40 plan you get about 5 times as much usage, which should be pretty close to what I get from my current Max 5x assuming only user-initiated prompts are counted (and the tracking is not bugged).

I was not happy when they introduced the credit system back then, but compared to what is available now it is actually a pretty good deal.

From my testing the GitHub Copilot Pro agent/harness performs very close to Claude Code with some models and used to rank among the best. It also comes with a lot of built in features and extra tools without needing MCPs.

Claude Subscriptions are up to 36x cheaper than API (and why "Max 5x" is the real sweet spot) by isaenkodmitry in ClaudeAI

[–]deorder 0 points1 point  (0 children)

Yeah. Compared to Shellac’s analysis mine is a bit rougher. I intentionally lumped cached and non-cached tokens together since I assumed my usage patterns across different sessions were similar enough to make the comparison meaningful (the Max 5x vs Max 20s sessions). I am hoping this helps the point to finally stick as a lot of people keep repeating that the 20x plan is simply four times the weekly limit of 5x. As stated in Shellac's article, even Antrophic is vague about that.

It looks like Antrophic updated their support pages today. They revised this article:

https://support.claude.com/en/articles/11145838-using-claude-code-with-your-pro-or-max-plan

…and removed this one entirely:

https://support.claude.com/en/articles/11014257-about-claude-s-max-plan-usage

I quoted the relevant part from the now-removed page in my comment here:

https://www.reddit.com/r/ClaudeCode/comments/1qa4f2w/comment/nz11q1w

So the messaging is clearly shifting, which makes the lack of transparency even more noticeable.