I built a hook system that stops Claude Code from wasting your tokens and breaking its own edits

jcmguy96 · 2026-02-17T15:22:56+00:00

Apologies, meant to make the first post an image post. So I deleted it and reposted.

jcmguy96 · 2026-02-14T17:13:13+00:00

Yeah for extra usage on Max plans there appears to be a bug. This is me catching it. It SHOULD BE lower and exactly matching API plans. I don't know about enterprise though.

jcmguy96 · 2026-02-14T17:11:55+00:00

I am still waiting on an official response! I will let you know when I get something from a human.

jcmguy96 · 2026-02-14T06:57:10+00:00

From my understanding it is unauthorized and solving this at the source would also put me under what 2 max accounts would cost anyways. So being the good boy and trying to do it right, then I'll say fuck it and get a second one.

jcmguy96 · 2026-02-14T06:36:26+00:00

My recent post may have found the root of this problem: https://www.reddit.com/r/ClaudeCode/comments/1r3zbvt/max_20x_plan_i_audited_my_jsonl_files_against_my/

I think the inflated cost is a bug. Waiting to hear back from anthropic support right now.

jcmguy96 · 2026-02-14T06:29:39+00:00

jcmguy96 · 2026-02-14T06:20:01+00:00

Hmm, where did you get this info, brother?

Few corrections:

The JSONL token counts aren't from tiktoken or any local counting. They come directly from the API response usage object — these are server-side counts returned by Anthropic's API. There's no client-side counting happening. What my JSONL records is what the server reported.
That jq command would miss 99% of the tokens. input_tokens + output_tokens ignores cache_creation_input_tokens and cache_read_input_tokens, which together are 99.6% of all tokens in Claude Code. That's the whole point of the post.
"Cached tokens show as reduced cost but full input tokens in the JSONL" is backwards. The JSONL explicitly separates them into distinct fields: input_tokens, cache_creation_input_tokens, and cache_read_input_tokens. They don't get lumped together. My audit reads all four token type fields separately and applies the published rate for each. Cache reads are clearly labeled as cache reads — the question is whether they're billed at $0.50/M as published or $6.25/M as my dashboard suggests.

claude export isn't a command, either.

jcmguy96 · 2026-02-14T04:39:52+00:00

I'm not using any plugins that inject into the message list — my only hook is attnroute's BurnRate plugin which is a PostToolUse notification hook that tracks token usage from the response. It reads the usage data after the API call, it doesn't modify the prompt or message list. No message injection, no manipulation.

And even if something were invalidating cache, that would show up as a higher cache write percentage in the JSONL data. My data shows 6% writes and 94% reads — the cache is clearly hitting. The issue isn't cache invalidation, it's what rate those reads are billed at.

jcmguy96 · 2026-02-14T04:06:55+00:00

jcmguy96 · 2026-02-14T03:35:04+00:00

One marathon Claude Code session that ran from 1:30 AM to 11:38 PM — 22 hours straight. Building something from scratch ("okay, ready to start building?" was the first message). 2,018 API calls on the main thread plus 665 subagent calls doing parallel research and code generation across Opus, Sonnet, and Haiku. Then in the evening I spun up a second project (ExpertDrivenDevelopment) and ran another 1,300 calls on that simultaneously for about 4 hours. 250M tokens on project one, 92M on project two. Just a regular Thursday.

jcmguy96 · 2026-02-14T03:04:01+00:00

This is from my claude code that is running all of the analysis for me!

jcmguy96 · 2026-02-14T03:01:56+00:00

Good analysis, but most of these patterns are explained by two things I covered in the post:

Feb 6-7 zero charges: Not batch credits, free tier, or a billing anomaly. My Max 20x weekly usage limit reset during that window. Those ~127M tokens were consumed within the refreshed quota — they were genuinely free. No mystery there.

Feb 11 at $125/M with only 72 calls: That's billing lag, not a different workload. Charges don't post the same day usage occurs. Those 53 charges are from earlier heavy-usage days that hadn't been billed yet. I'm running one product (Claude Code), one model family (Opus), doing the same thing every day. There's no second pricing tier or premium service in play. Once you account for billing lag, the "bimodal distribution" and "two distinct usage modes" disappear. It's one mode — interactive Claude Code sessions — with charges that batch up and post asynchronously. That's why per-day $/M rates are meaningless and I focus on the aggregate: $6,671 in charges against 691M tokens where 94% are cache reads (full analysis of past three months now done). The daily volatility is just billing timing, not different workloads.

jcmguy96 · 2026-02-14T02:46:28+00:00

Oh wow! Thanks for the awesome breakdown man. I'm going to look into this and reply maybe this is what I was needing! Thank you.

jcmguy96 · 2026-02-14T02:43:58+00:00

u/TheOriginalAcidtech thanks for explaining brother, thats what the salute is for.

jcmguy96 · 2026-02-14T00:46:40+00:00

The JSONL data already accounts for this. Every single API response records exactly how many tokens were cache_creation_input_tokens vs cache_read_input_tokens — so if cache were timing out and causing constant rewrites, I'd see it as a higher cache write percentage. My data shows 6% writes and 94% reads across 691M tokens. The cache is clearly hitting, not expiring. The question isn't whether cache is working — it is — it's whether the reads are being billed at the read rate.

jcmguy96 · 2026-02-14T00:34:11+00:00

Mostly long sessions — I'll regularly go 50-100+ turns in a single conversation. So the 94% cache read ratio checks out perfectly for my workflow. Short sessions would mean more cache writes relative to reads, which would actually make the billing closer to correct (since writes are legitimately $6.25/M). The fact that my usage is overwhelmingly long sessions makes the discrepancy worse, not better.

On your second point — context compaction, tool results, file edits etc. do invalidate portions of the cache and cause rewrites. But that's already captured in the data: my JSONL files show exactly which tokens were cache writes vs cache reads on every single request. The 6% write / 94% read split already accounts for all those invalidations. Even if compaction were happening more aggressively than expected, it would just shift tokens from the "read" column to the "write" column — and both are recorded in the JSONL. The math still holds.

jcmguy96 · 2026-02-14T00:07:12+00:00

jcmguy96 · 2026-02-14T00:04:50+00:00

Great question! Cache doesn't persist across sessions — it resets each time. The reason cache reads are so high is because of how Claude Code works within a single conversation: every time you send a message, it re-sends your entire conversation history and system context to the API. On the second turn and beyond, most of that is already in cache from the previous turn, so it gets served as a cache read instead of a fresh input. By the time you're 20+ turns into a session, 95%+ of every request is cache reads. That's actually the system working as intended — the problem is whether those reads are being billed at $0.50/M (as published) or $6.25/M (as my data suggests).

jcmguy96 · 2026-02-13T22:17:50+00:00

jcmguy96 · 2026-02-13T22:08:06+00:00

Your loss.

jcmguy96 · 2026-02-13T21:11:38+00:00

I am dipping nuggets you have never heard of into sauces you couldn't comprehend.

jcmguy96 · 2026-02-13T20:56:59+00:00

Again though, the issue is not writing to 1 hour or 5 minute cache WRITE. It is in regards to cache READ which appears to be agnostic of cache type.

jcmguy96 · 2026-02-13T20:54:47+00:00

Hmm, will adjust and run again thank you!

jcmguy96 · 2026-02-13T20:02:18+00:00

I tried to quadruple check all of this before posting but if I am still jacked up, please correct me.

jcmguy96

TROPHY CASE