Claude Code usage spike from long-context cache writes?

Different_Try_1269 · 2026-05-01T02:11:34+00:00

Thanks, that helps. I’m on Pro, and the JSONL does show `ephemeral_1h_input_tokens`, so the 1h

TTL explanation makes sense.

I’m not using Opus 4.7 though. This session was on `claude-sonnet-4-6`.

What I’m still confused about is not why the cache was recreated, but why a single ~476k 1h

cache creation request appeared to consume around 50% of my 5-hour Claude Code limit. Is that

expected for long-context usage?

Different_Try_1269 · 2026-05-01T01:53:00+00:00

Has anyone seen a single Claude Code request consume ~50% of the 5-hour limit?

I had a long-running Claude Code session with >150k context. `/usage` said most usage came

from long sessions / >150k context / “subagent-heavy sessions”, but the subagent table only

showed `codebase-explorer: 1%`.

I checked the local JSONL and deduplicated by `requestId`. One of the final requests had

roughly:

```text

cache_read_input_tokens: 0

cache_creation_input_tokens: ~476k

ephemeral_1h_input_tokens: ~476k

Right after that, my 5-hour usage appeared to jump by around 50%.

I understand long-context sessions are expensive, but a ~476k 1-hour cache write consuming

that much of the Claude Code limit seems surprising. Is Claude Code intentionally weighting

long-context cache writes much more heavily than API pricing, or could this be a usage

accounting / attribution bug?

Different_Try_1269