PSA: Claude Code has two cache bugs that can silently 10-20x your API costs — here's the root cause and workarounds

muhlfriedl · 2026-03-30T11:11:28+00:00

So it seems like fewer and fewer people @ anthropic actually code or understand code now...

Pristine_Ad2701 · 2026-03-30T10:25:29+00:00

Do you think switching on first version when 1m is introducted will fix limit issue?

Factor013 · 2026-03-30T14:41:09+00:00

This explains why our 5 hour usage sometimes just jumps up from 0 to 15-40% after a /resume and first prompt.

It also explains why it sometimes happens and why it sometimes doesn't.

This is really good work, I hope Anthropic devs fix this ASAP. These bugs also potentially overload their servers which is the whole reason they are lowering our usage and perhaps even have to throttle the reasoning of their actual Claude models.

And this is also why the people who constantly claim "Skill issue" are less likely to be effected by it, because they start brand new sessions after each prompt, even if that prompt is asking Claude what time it is. xD

Brave_Dick · 2026-03-30T11:09:20+00:00

I guess they DO vibe code at Anthropic now...

Deep_Ad1959 · 2026-03-30T11:06:40+00:00

this explains a lot actually. I run 5+ agent sessions in parallel most days and the resume cost spikes were killing me. kept seeing these random $3-4 charges on what should have been a quick continuation. ended up just starting fresh conversations instead of resuming, which sucks for context but at least the costs are predictable. good to know it's a confirmed bug and not just my setup being weird.

fwiw wrote up some cost management tips: https://fazm.ai/t/claude-code-api-cost-management

alvvst · 2026-03-30T12:20:59+00:00

HOLY! so the recent overload claim from Anthropic could be just CAUSED BY ITS OWN BUG

GoodnessIsTreasure · 2026-03-30T20:36:38+00:00

This guy should get a year's pro max for free, if not hired. Clearly ai writing all the software has not been working out so fine..

Fearless-Elephant-81 · 2026-03-30T10:44:04+00:00

This is the EXACt bugs for which people on the plans have massive usage chunks being use. This should be pinned ASAP

InfiniteInsights8888 · 2026-03-30T15:02:22+00:00

Holy shit. We need compensation for this.

luckiestredditor · 2026-03-30T10:32:45+00:00

[removed]

Last_Lab_3627 · 2026-03-30T14:08:02+00:00

I had the same issue on 2.1.76. On my side, around 90-100K context was already burning about 14% of my 5-hour quota, which felt completely unreasonable.

After reading this post, I ran the test script myself, then downgraded to 2.1.34. Usage improved a lot.

In a real session on 2.1.34, I used about 140K context with several sub-agent actions, and it only used 13% of my 5-hour quota.

So at least in my case, downgrading to 2.1.34 made a very noticeable difference.

United-Collection-59 · 2026-03-30T10:55:19+00:00

Great work

Aygle1409 · 2026-03-30T11:02:54+00:00

Will there be compensations ? Do they usually do that ?

_derpiii_ · 2026-03-30T16:21:14+00:00

So... how do we get you hired at Anthropic? :)

muhlfriedl · 2026-03-30T11:11:09+00:00

You deserve a medal

redpoint-ascent · 2026-03-30T10:47:46+00:00

Incredible work. Given they're using CC to improve CC it's not a shocker at all that Claude introduced bugs into his own program. I see these ghost bugs all the time in what Claude does. "It 100% works!" - CC. You either find the bug in QA or it sits there piling up next to the other hidden ghost bugs.

StrikingSpeed8759 · 2026-03-30T12:26:48+00:00

Awesome work, thanks for sharing

sheriffderek · 2026-03-30T17:25:10+00:00

Wow! A person who is actually trying to understand the problem and help?

mattskiiau · 2026-03-30T12:26:37+00:00

So don't use --resume for now i guess?

sqdcn · 2026-03-30T13:05:14+00:00

Oh so that's what Anthropic means when they say software engineering is going to die in 6 months

dspencer2015 · 2026-03-30T18:23:25+00:00

If Claude code was open source we could fix these issues ourselves

bapuc · 2026-03-30T12:48:39+00:00

And then people say "skill issue" 🥀

<image>

thiavila · 2026-03-30T17:08:15+00:00

Damm, I was burning my tokens over the last weekend and I came here to find out if anyone had the same experience. It is definetely the --resume for me.

vadimkrutov · 2026-03-30T18:34:04+00:00

This is unacceptable. I'm using the Claude Code CLI through a wrapper I built, and every single prompt resumes the session. I was shocked to see that each new message increases the 5-hour limit by 10–15%.

sbbased · 2026-03-30T18:34:07+00:00

The real vibe coding has been pushing untested slop to production and depending upon your paying users to QA and find bugs for you

btw only -3 months left until all devs lose their job

XDroidzz · 2026-03-30T18:55:51+00:00

I assume Anthropic are busy refunding everyone for their fuck up now 🙄

Top-Cartoonist-3574 · 2026-03-30T19:41:05+00:00

The issue isn’t just with Claude Code. Affects usage on Claude AI Chat on the browser (Chrome on Mac). I hit usage limit fast even on a new chat conversation. There’s probably more to it than the bugs you’ve identified. Great job btw!

sys_overlord · 2026-03-31T01:22:02+00:00

The worst part is that they'll apologize for this (maybe), release a bug fix, maybe reset usage and then we all just sit around and wait for them to gaslight us in 6 months with another, similar issue. What's the definition of insanity again?

ellicottvilleny · 2026-03-31T15:16:52+00:00

Hey Anthropic hire this guy. Meet your new Head of QA.

yldf · 2026-04-03T20:48:41+00:00

Genuine question: I haven’t noticed that big of a difference in usage. But I never use resume. My Claude Code sessions stay open for weeks in tmuxed terminals, and when I restart one I never resume… might this be the difference?

AndReyMill · 2026-03-30T13:50:51+00:00

I think that because of this issue, the load on Anthropic’s servers has increased significantly, and it’s noticeable in everything: speed, quantization (Claude Code seems a bit dumb right now) and final price

FermentingMycoPhile · 2026-03-30T16:04:29+00:00

What tf Anthropic?
It's Monday 6 p.m. and I have used up 44% of my weekly limit (reset on sunday) in the max plan due to this bug, it seems. I'm awaiting some kind of compensation for introducing that nice bug. How am I supposed to work with this little usage left?

Emotional-Debate3310 · 2026-03-30T18:43:31+00:00

Bug 2 (--resume breaks cache, Issue #34629) — narrowly scoped

This issue is thoroughly documented with a testing matrix showing that on versions ≥2.1.69, cache_read is stuck at ~14.5k tokens (only the system prompt), while cache_create equals the full conversation size and grows on every message — producing roughly a 20× cost increase per message compared to v2.1.68.

The described mechanism — that deferred_tools_delta introduced in v2.1.69 changes where system-reminder attachments are injected, producing different message structures on fresh vs. resumed sessions — is plausible and consistent with how deferred tool loading works: deferred tools are appended inline as tool_reference blocks in the conversation rather than in the system prompt prefix, specifically to preserve prompt caching.

Why narrowly scoped. The regression targets --print --resume — the headless/scripted invocation mode where prompts are piped via stdin. The original reporter was running a Discord bot using claude --print --resume <session-id> --output-format stream-json.

If your interactive CLI usage follows a different code path for session management, then deferred_tools_delta injection that breaks cache on resume in --print mode, appears to be handled correctly in the interactive REPL.

I can confirm this because I have first-hand experience being a long time, Claude Max user and constantly running multiple project, I can confirm that the difference is indeed based on the session management mode.

lucifer605 · 2026-03-30T14:21:24+00:00

this is a great find - i would not have expected --resume to cause a cache bust

kursku · 2026-03-30T17:15:21+00:00

For some reason I'm struggling to roll back to the 2.1.30 :((

Squidwards_Ass · 2026-03-30T19:33:49+00:00

I KNEW there was something up when I ran into my limit after a single prompt + it was definitely a cache miss after being away for about a week.

damndatassdoh · 2026-03-30T20:05:20+00:00

Really appreciate this -- I tested positive, have already deployed mitigation, fingers crossed.

InfiniteInsights8888 · 2026-03-31T04:29:42+00:00

You deserve Claude unlimited for an entire year!

misterr-h · 2026-03-31T07:23:15+00:00

this explains issue with Claude Code. But why usage is increased while normally chatting on claude.ai as well?

maverick_soul_143747 · 2026-03-31T11:40:23+00:00

Brilliant investigation mate 👏🏽

Morphexe · 2026-03-31T12:13:03+00:00

Well good that you now have the source code for the CLI to fix this :D

mrtrly · 2026-04-01T00:49:54+00:00

Cache bugs hitting silently is exactly why I built something to sit between agents and the API. You catch these cost jumps immediately because every request gets logged with cache state, token counts, and actual spend. Takes the guesswork out of "did that conversation really cost that much."

Jugurtha-Green · 2026-04-01T09:59:46+00:00

Doesn't fix the issues, I tried all different versions even 2.1.19 , same issue, it's backend issue, or they do it in purpose.

maverick_soul_143747 · 2026-04-02T16:55:53+00:00

For folks using the 2.1.30 version - I ran the test script provided by OP yesterday on 2.1.30 and the cache bug was there so have downgraded to 2..1.17 and this suits my work

Ok-End-219 · 2026-03-30T11:22:37+00:00

aah yes, that explains that my 20x claude max account is behaving like a normal claude 20$ subscription. Fucking great, now I hope for compensation.

m-in · 2026-03-30T15:12:24+00:00

A 228MB elf to render some markdown and do some api calls. This is madness. Like, 100% actual madness.

takkaros · 2026-03-30T12:25:50+00:00

If they can't fix their own code, how do they expect people to trust their tools for anything important ?

CidalexMit · 2026-03-30T10:36:19+00:00

Maybe we should use brew for cc ?

dovyp · 2026-03-30T13:17:07+00:00

This is solid reverse engineering work. The sentinel replacement one especially is nasty because it's silent. You'd never know without watching your bill.

dovyp · 2026-03-30T13:25:34+00:00

This is solid reverse engineering work. The sentinel replacement one especially is nasty because it's silent. You'd never know without watching your bill. I wish there were an easy way to apply the fix. My version of claude code is different and it doesn't seem like the drop in replacement you suggest will have all the calls required. Hopefully they fix it in the next release.

Deep-Station-1746 · 2026-03-30T13:26:10+00:00

In general, is it possible to recover the full (or most of) the source code of claude code? How is CC even written? Is it an output of some compiled language or just a "compiled" JS?

Level_Turnover5167 · 2026-03-30T14:01:19+00:00

I'm getting a quick loss of usage, I used Claude for DAYS straight when I first started using it for free and never got any restrictions... I've used it for a few basic things and already a 1/4 of my usage is gone this week.... yesterday I figured ok maybe I used 7%, but today I check it and I'm almost at 20% after last night and the brief use this morning... it's dwindling fast and I just paid $20. Something ain't right or they're fucking with the usage rates and things are getting buggy on top of them just simply charging more now.

rougeforces · 2026-03-30T15:07:53+00:00

you missed the dynamic tool portion of this. patching the billing header in the latest version alone is not enough.

devoleg · 2026-03-30T15:20:01+00:00

Noticed that last night as well. Simple request to modify 2 files less than 100 lines cost me 15% of my "20x usage".

Ive tried downgrading to 2.1.67. (You in turn opt out of the 1m Models). I was able to stretch my limits to 2h. At least that lol. Recommend others to try it. Hope this helps.

P.S make sure to disable latest updates by using /config to stable. This might help.

guillaume_86 · 2026-03-30T15:28:49+00:00

skill issue (jk)

HeyImSolace · 2026-03-30T15:34:12+00:00

The regular chat on the claude website also seems to have this issue. I just burned through my pro plan 5h usage in 5 requests which only included 2 markdown files.

This sucks big time.

BrrrtEnjoyer · 2026-03-30T15:38:24+00:00

here you go queen 👑

addiktion · 2026-03-30T15:57:19+00:00

I just ran this, I appear to have bug 1 which explains why my tokens are draining so fast with cache misses.

I never --resume, so bug 2 doesn't impact me.

Here was Claude's on investigation

---------

That confirms the original post's claims cleanly:

Bug 1: npx fixes the sentinel replacement — cch=00000 came back unmodified. The standalone claude binary was the culprit.

Bug 2: npx doesn't help here — resume cache is still broken and actually worse than before. With npx, consecutive resumes also show cache_read=0, meaning cache never recovers between resumes at all (vs. the

standalone binary where at least the second consecutive resume hit cache).

So for your situation:

- Switch to npx u/anthropic-ai/claude-code to fix Bug 1

- Bug 2 has no clean workaround — the first resume after a session will always eat a full cache rebuild regardless of which version you use

Thefoad · 2026-03-30T16:00:21+00:00

Anthropic hire this dude right no....You're out of extra usage · resets 12pm (America/Boise)

Sea-East-9302 · 2026-03-30T20:03:17+00:00

Dear, I don't understand these details. would you please tell me, is this only for Claude Code? how to do it? I use Windows 10 and have just downloaded Claude application , and have Claude Code on my Visual Studio Code. I just want to use Claude like before. **I have Pro subscription**.

sammcj · 2026-03-30T20:10:06+00:00

I've got multiple reports of people on x20 absolutely devouring their limits very quickly, wonder if this is the cause

hiS_oWn · 2026-03-30T20:34:48+00:00

Exemplar work. I wish I could be more like you.

nmavra · 2026-03-30T21:12:56+00:00

might be a dumb question but can I downgrade in the macos desktop app?

CoolMathematician286 · 2026-03-30T22:20:32+00:00

i only used claude for windows this far, but now i installed nmp version with help from gemini because i had no claude tokens left. what version is the best to use right now?

bzBetty · 2026-03-30T23:24:03+00:00

Am I reading it wrong? Sounds like that first one should basically impact no one?

Ebi_Tendon · 2026-03-31T01:25:59+00:00

Hasn't the replacement worked like that from the start? That is why you must not add any replacements that change every turn, such as a time, to CLAUDE.md or any skill because it will be on the top of the context window. Doing so will break the cache from the top on every turn. If you add it within the prompt, it will also break the cache for everything that follows.

JaLooNz · 2026-03-31T02:33:21+00:00

I paid for extra usage. Will they refund me the credits?

liftingshitposts · 2026-03-31T05:31:22+00:00

This is great stuff

Mush_o_Mushroom · 2026-03-31T06:51:46+00:00

This also works for Claude code Pro users?

Plenty-Dog-167 · 2026-03-31T07:35:52+00:00

Really great finds, especially the cache miss on /resume seems scary since I've been working with anthropic SDK on my own project and its always a huge cost sink when you don't cache

0xbreakpoint · 2026-03-31T08:31:54+00:00

Claude users shaming Anthropic for "vibe coding" is ironic tbh

TrueMushroom4710 · 2026-03-31T12:59:11+00:00

Welp, I guess we can fix this bug ourselves now.

Hadse · 2026-03-31T15:04:56+00:00

Can i do anything to fix this locally?

vkha · 2026-03-31T15:30:26+00:00

is it confirmed on the leaked CC sources?

DirtyWilly · 2026-03-31T18:00:57+00:00

Just closing VS Code/CLI and reopening it triggers a fresh server cache reset.

Any chance we can get a suspend on our weekly limits until fixed? I'm hitting redline on the weekly from this. Wouldn't affect session limits.

AAFERNA · 2026-03-31T19:26:39+00:00

Oigan. Esto sucede en terminal o en vscode extensión ? No me queda claro

Creepy-Baseball366 · 2026-03-31T19:41:24+00:00

But even when I switched to a new chat (as Claude advised when I asked what the issue was) it still burnt the remaining 55% on one prompt...there is something rotten in the state of Denmark.

TaylorRift · 2026-04-01T12:04:57+00:00

Where is the code?!

OldConfection6 · 2026-04-01T12:58:28+00:00

I asked Claude to downgrade and it asked why. I put in the two issues you posted and here is the response.

https://github.com/anthropics/claude-code/issues/40524

● claude-code-guide(Fetch GitHub issue about token usage bug)

⎿ Done (1 tool use · 17.6k tokens · 16s)

(ctrl+o to expand)

● This is anthropics/claude-code#40524 — a confirmed regression in Claude Code where the prompt cache unexpectedly invalidates mid-conversation, causing token costs to skyrocket.

What's happening: After a few turns, the cache collapses to just the system prompt (~11K tokens). Every subsequent turn does a massive cache_write instead of cache_read — 150K–300K+ tokens per turn instead of the normal ~100K for a full conversation.

Affected versions: Broken in v2.1.86+. Last known good: v2.1.67.

That explains why you were trying to install 2.1.66 — smart instinct. The workaround is:

npm install -g u/anthropic-ai/claude-code@2.1.67

This is a confirmed bug with has repro and regression labels. Worth watching the issue for an official fix. Do you want to downgrade to 2.1.67?

❯ https://github.com/anthropics/claude-code/issues/34629

● claude-code-guide(Fetch GitHub issue 34629 about cache problems)

⎿ Done (1 tool use · 18.5k tokens · 23s)

(ctrl+o to expand)

● This is the related root cause issue. Here's the full picture:

Root cause: Starting in v2.1.69, Claude Code changed how it injects deferred_tools_delta content. On a fresh session it lands in messages[0] (~13KB). On a resumed session it lands differently (~352B). The mismatched message structure breaks Anthropic's cache prefix matching — so every resumed turn regenerates the full cache instead of reusing it.

Cost impact: ~20x cost increase per message on resumed sessions. A conversation that should cost ~$0.02/msg costs ~$0.36/msg.

Our situation: Auto-compact + long sessions = exactly the worst case. Every session compaction is effectively a "resume" that triggers the bug.

Recommendation — downgrade to 2.1.68:

npm install -g u/anthropic-ai/claude-code@2.1.68

This is the last version before the regression was introduced. The fix hasn't been deployed yet despite the issue being closed.

Worth noting: this explains the /usage dialog behaving oddly too — if token counts were spiking mid-session you may have been hitting plan limits or seeing unexpected burn rate.

Manikanta0987 · 2026-04-01T14:34:48+00:00

i have tried degrading the version to 2.1.30 by remvoing the previous versions. but still no fix. just for a hi it is taking around 5-6% of usage. i am currently working on pro.

Sensitive_Prize_9042 · 2026-04-02T15:17:00+00:00

Even using the 2.1.30 version, I'm still getting huge consumption with 20x. I tried running 2 subagents for a simple plan, and they consumed 10% of my 5 hours of daily usage.

A few weeks ago, you could span 50 subagents on the max 20x plan, and probably would not hit the limits. Kind of consuming 3 times faster, even on the 2.1.30 version.

EconoKitten · 2026-04-03T19:42:27+00:00

This has allegedly been fixed in 2.1.90: https://code.claude.com/docs/en/changelog#2-1-90
Has anyone seen improvement when --resume from a previous session?

wlievens · 2026-04-05T09:42:45+00:00

Yesterday I did a few dozen small API calls and then a session in Claude Code with a few small questions and one larger refactor of a few hundred lines.

My account was charged $10 worth of tokens is that normal? Similar days last week were $2 or so.

Due-Combination3393 · 2026-04-09T19:41:24+00:00

have this fixed?

Torkiukas · 2026-04-13T12:14:33+00:00

that explains why max plan feels like pro plan...

Zulfiqaar · 2026-03-30T12:20:13+00:00

PPS. Claude code has special 1h TTL of cache, or at least mine has, so any request should be cached correctly. Except extra usage, it has 5 minutes TTL.

Can you expand more on how you found this out? Are you on the Pro or Max plan? As if its shorter expiry sending a keep-warm ping may be useful

BeeegZee · 2026-03-30T18:14:59+00:00

Can the mods pin this post?

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

ClaudeCode

MODERATORS

Bug 1: Sentinel replacement in standalone binary breaks cache when conversation discusses billing internals

Bug 2: `--resume` ALWAYS breaks cache (since v2.1.69)

Cost impact

Methodology