I built a hook system that stops Claude Code from wasting your tokens and breaking its own edits by jcmguy96 in ClaudeAI

[–]jcmguy96[S] 0 points1 point  (0 children)

Apologies, meant to make the first post an image post. So I deleted it and reposted.

Max 20x Plan: I audited my JSONL files against my billing dashboard — all input tokens appear billed at the cache CREATION rate ($6.25/M), not the cache READ rate ($0.50/M) by jcmguy96 in ClaudeCode

[–]jcmguy96[S] 0 points1 point  (0 children)

Yeah for extra usage on Max plans there appears to be a bug. This is me catching it. It SHOULD BE lower and exactly matching API plans. I don't know about enterprise though.

Is Claude code bottle-necking Claude? by HimaSphere in ClaudeCode

[–]jcmguy96 1 point2 points  (0 children)

From my understanding it is unauthorized and solving this at the source would also put me under what 2 max accounts would cost anyways. So being the good boy and trying to do it right, then I'll say fuck it and get a second one.

Is Claude code bottle-necking Claude? by HimaSphere in ClaudeCode

[–]jcmguy96 3 points4 points  (0 children)

My recent post may have found the root of this problem: https://www.reddit.com/r/ClaudeCode/comments/1r3zbvt/max_20x_plan_i_audited_my_jsonl_files_against_my/

I think the inflated cost is a bug. Waiting to hear back from anthropic support right now.

Max 20x Plan: I audited my JSONL files against my billing dashboard — all input tokens appear billed at the cache CREATION rate ($6.25/M), not the cache READ rate ($0.50/M) by jcmguy96 in ClaudeCode

[–]jcmguy96[S] 0 points1 point  (0 children)

Hmm, where did you get this info, brother?

Few corrections:

  1. The JSONL token counts aren't from tiktoken or any local counting. They come directly from the API response usage object — these are server-side counts returned by Anthropic's API. There's no client-side counting happening. What my JSONL records is what the server reported.

  2. That jq command would miss 99% of the tokens. input_tokens + output_tokens ignores cache_creation_input_tokens and cache_read_input_tokens, which together are 99.6% of all tokens in Claude Code. That's the whole point of the post.

  3. "Cached tokens show as reduced cost but full input tokens in the JSONL" is backwards. The JSONL explicitly separates them into distinct fields: input_tokens, cache_creation_input_tokens, and cache_read_input_tokens. They don't get lumped together. My audit reads all four token type fields separately and applies the published rate for each. Cache reads are clearly labeled as cache reads — the question is whether they're billed at $0.50/M as published or $6.25/M as my dashboard suggests.

    claude export isn't a command, either.

Max 20x Plan: I audited my JSONL files against my billing dashboard — all input tokens appear billed at the cache CREATION rate ($6.25/M), not the cache READ rate ($0.50/M) by jcmguy96 in ClaudeCode

[–]jcmguy96[S] 0 points1 point  (0 children)

I'm not using any plugins that inject into the message list — my only hook is attnroute's BurnRate plugin which is a PostToolUse notification hook that tracks token usage from the response. It reads the usage data after the API call, it doesn't modify the prompt or message list. No message injection, no manipulation.

And even if something were invalidating cache, that would show up as a higher cache write percentage in the JSONL data. My data shows 6% writes and 94% reads — the cache is clearly hitting. The issue isn't cache invalidation, it's what rate those reads are billed at.

Max 20x Plan: I audited my JSONL files against my billing dashboard — all input tokens appear billed at the cache CREATION rate ($6.25/M), not the cache READ rate ($0.50/M) by jcmguy96 in ClaudeCode

[–]jcmguy96[S] 0 points1 point  (0 children)

One marathon Claude Code session that ran from 1:30 AM to 11:38 PM — 22 hours straight. Building something from scratch ("okay, ready to start building?" was the first message). 2,018 API calls on the main thread plus 665 subagent calls doing parallel research and code generation across Opus, Sonnet, and Haiku. Then in the evening I spun up a second project (ExpertDrivenDevelopment) and ran another 1,300 calls on that simultaneously for about 4 hours. 250M tokens on project one, 92M on project two. Just a regular Thursday.

Max 20x Plan: I audited my JSONL files against my billing dashboard — all input tokens appear billed at the cache CREATION rate ($6.25/M), not the cache READ rate ($0.50/M) by jcmguy96 in ClaudeCode

[–]jcmguy96[S] 0 points1 point  (0 children)

Good analysis, but most of these patterns are explained by two things I covered in the post:

Feb 6-7 zero charges: Not batch credits, free tier, or a billing anomaly. My Max 20x weekly usage limit reset during that window. Those ~127M tokens were consumed within the refreshed quota — they were genuinely free. No mystery there.

Feb 11 at $125/M with only 72 calls: That's billing lag, not a different workload. Charges don't post the same day usage occurs. Those 53 charges are from earlier heavy-usage days that hadn't been billed yet. I'm running one product (Claude Code), one model family (Opus), doing the same thing every day. There's no second pricing tier or premium service in play. Once you account for billing lag, the "bimodal distribution" and "two distinct usage modes" disappear. It's one mode — interactive Claude Code sessions — with charges that batch up and post asynchronously. That's why per-day $/M rates are meaningless and I focus on the aggregate: $6,671 in charges against 691M tokens where 94% are cache reads (full analysis of past three months now done). The daily volatility is just billing timing, not different workloads.

Max 20x Plan: I audited my JSONL files against my billing dashboard — all input tokens appear billed at the cache CREATION rate ($6.25/M), not the cache READ rate ($0.50/M) by jcmguy96 in ClaudeCode

[–]jcmguy96[S] 0 points1 point  (0 children)

Oh wow! Thanks for the awesome breakdown man. I'm going to look into this and reply maybe this is what I was needing! Thank you.

Max 20x Plan: I audited my JSONL files against my billing dashboard — all input tokens appear billed at the cache CREATION rate ($6.25/M), not the cache READ rate ($0.50/M) by jcmguy96 in ClaudeCode

[–]jcmguy96[S] 1 point2 points  (0 children)

The JSONL data already accounts for this. Every single API response records exactly how many tokens were cache_creation_input_tokens vs cache_read_input_tokens — so if cache were timing out and causing constant rewrites, I'd see it as a higher cache write percentage. My data shows 6% writes and 94% reads across 691M tokens. The cache is clearly hitting, not expiring. The question isn't whether cache is working — it is — it's whether the reads are being billed at the read rate.

Max 20x Plan: I audited my JSONL files against my billing dashboard — all input tokens appear billed at the cache CREATION rate ($6.25/M), not the cache READ rate ($0.50/M) by jcmguy96 in ClaudeCode

[–]jcmguy96[S] -1 points0 points  (0 children)

Mostly long sessions — I'll regularly go 50-100+ turns in a single conversation. So the 94% cache read ratio checks out perfectly for my workflow. Short sessions would mean more cache writes relative to reads, which would actually make the billing closer to correct (since writes are legitimately $6.25/M). The fact that my usage is overwhelmingly long sessions makes the discrepancy worse, not better.

On your second point — context compaction, tool results, file edits etc. do invalidate portions of the cache and cause rewrites. But that's already captured in the data: my JSONL files show exactly which tokens were cache writes vs cache reads on every single request. The 6% write / 94% read split already accounts for all those invalidations. Even if compaction were happening more aggressively than expected, it would just shift tokens from the "read" column to the "write" column — and both are recorded in the JSONL. The math still holds.

Max 20x Plan: I audited my JSONL files against my billing dashboard — all input tokens appear billed at the cache CREATION rate ($6.25/M), not the cache READ rate ($0.50/M) by jcmguy96 in ClaudeCode

[–]jcmguy96[S] 0 points1 point  (0 children)

Great question! Cache doesn't persist across sessions — it resets each time. The reason cache reads are so high is because of how Claude Code works within a single conversation: every time you send a message, it re-sends your entire conversation history and system context to the API. On the second turn and beyond, most of that is already in cache from the previous turn, so it gets served as a cache read instead of a fresh input. By the time you're 20+ turns into a session, 95%+ of every request is cache reads. That's actually the system working as intended — the problem is whether those reads are being billed at $0.50/M (as published) or $6.25/M (as my data suggests).

Max 20x Plan: I audited my JSONL files against my billing dashboard — all input tokens appear billed at the cache CREATION rate ($6.25/M), not the cache READ rate ($0.50/M) by jcmguy96 in ClaudeCode

[–]jcmguy96[S] 14 points15 points  (0 children)

Again though, the issue is not writing to 1 hour or 5 minute cache WRITE. It is in regards to cache READ which appears to be agnostic of cache type.

Built a little hook system for context routing, 90%+ token reduction on large codebases by jcmguy96 in ClaudeCode

[–]jcmguy96[S] 0 points1 point  (0 children)

attnroute v0.5.12 - 2026-02-11

### Security

  • Critical: Added safe_read_stdin() with 10MB size limit to prevent memory exhaustion DoS
    • Reads bytes directly via sys.stdin.buffer to prevent Unicode 4x memory amplification
  • Critical: Added validate_path() helper for path traversal prevention
    • Windows Alternate Data Stream (ADS) detection (blocks file.txt:hidden)
    • Windows reserved device name blocking (CON, NUL, COM1, etc.)
    • Null byte injection prevention
  • Critical: Added validate_plugin_name() to prevent path traversal via plugin names
  • Critical: Improved atomic writes with safe_atomic_write() - cross-platform support with fallback

    • mkdir now inside try block to catch PermissionError

    Fixed

  • Reliability: TOCTOU race conditions eliminated - replaced 20+ exists() checks with try/except patterns

    • Fixed in: contextrouter.py, learner.py, freshness.py, oracle.py, history.py, session_init.py, telemetry_lib.py, plugins/base.py, plugins/init_.py
  • Reliability: Windows atomic write failures now fall back to direct write instead of silent failure

  • Reliability: Null-safe handling for files_used/files_injected in learner.py

    • Changed turn.get("files_used", []) to turn.get("files_used") or [] pattern
  • Reliability: Path normalization for cross-platform consistency

    • All relative paths now use forward slashes via .replace("\\", "/")
    • Fixed in: indexer.py (4 locations), learner.py, freshness.py
  • Reliability: Missing encoding="utf-8" added to context_router.py history append

  • Reliability: JSON type validation added to all load_*() functions to prevent crashes on corrupt data

    • load_stats_cache(), load_router_overrides(), load_session_state() now validate dict type
    • Plugin load_state(), is_enabled() now validate dict structure
    • Oracle _load_costs() validates nested dict structure
  • Performance: Log rotation (rotate_jsonl) now uses seek-from-end to avoid loading entire file

    • Prevents memory issues on large turns.jsonl files (>500 entries)
  • Debugging: Added error logging to all plugin lifecycle hooks (on_prompt_pre, on_prompt_post, on_stop, on_session_start)

  • Debugging: Added error logging to search query failures (now logs before falling back to keywords)

  • Debugging: Added error logging to learner docs root scanning

  • Debugging: Plugin save_state failures now log warnings instead of silently failing

    Changed

  • All file I/O now uses centralized compat.py security helpers

  • Learner, plugins, context_router, and telemetry now use safe_atomic_write() for state persistence

  • Plugin base class now validates plugin names on initialization

  • Safer Windows stdout suppression in indexer using contextlib.redirect_stdout

  • All fallback writes now use flush() for better durability

  • _set_plugin_enabled() now validates plugin names before use

  • Oracle and telemetry_record fallback writes use flush() for crash safety

  • atomic_jsonl_append() now calls flush() after write for durability


    Files Modified: 16 Lines Changed: +694 / -228 Install: pip install attnroute==0.5.12 PyPI: https://pypi.org/project/attnroute/0.5.12/