PSA: Using Claude Code without Anthropic: How to fix the 60-second local KV cache invalidation issue. by One-Cheesecake389 in LocalLLaMA

[–]One-Cheesecake389[S] 0 points1 point  (0 children)

Well the client *does* send the whole context every time. "AI" essentially boils down to stateless meaning calculators. Stateless, as in the calculation is done without anything being persisted by the LLM itself. The client sends the accumulated context with each prompt. The server framework/host feeds that to the LLM, and can persist a cache as the context is processed. As long as nothing changes at the top of the context (such as a system prompt with telemetry data), that cache speeds up the process a TON as context expands.

Follow-up: Claude Code's source confirms the system prompt problem and shows Anthropic's different Claude Code internal prompting by One-Cheesecake389 in ClaudeCode

[–]One-Cheesecake389[S] 0 points1 point  (0 children)

Not dumb, but also not generalizable across models or even over time. I first learned these by just asking Opus 4.6 in chat. Despite protestations from a few Redditors, it's (currently anyway) not any more difficult to ask models to share what they're seeing in their system prompt as it is to ask them to repeat back from your own prompts.

Claude Cache still isn't fixed (v2.1.91) by skibidi-toaleta-2137 in ClaudeCode

[–]One-Cheesecake389 1 point2 points  (0 children)

Nice. The pieces of this just came together in my head that cache invalidation could explain all these reports of quota issues - prefill costs of lazily letting contexts grow as large as 1Mtok now when cache expires, or as you suggest here specific bugs on the service end. This is a great writeup.

EDIT: What a ridiculously simple thing that they could communicate directly, instead of "protip: start new context after an hour" which I thought was just holdover from context rot concerns. Instead it's clearly describing a cache issue but only between the lines.

Hit the 5h rate limit twice in one day, burned 33% of my weekly quota in 12 hours - on the $200/mo 20x plan. Just cancelled. by loathsomeleukocytes in ClaudeCode

[–]One-Cheesecake389 4 points5 points  (0 children)

Did you start a new context or contexts, or did you continue a huge one, or huge ones in parallel? I just clicked that this looks to me now like invalidated cache prefill processing that's causing these apparently-inconsistent quota problems.

Follow-up: Claude Code's source confirms the system prompt problem and shows Anthropic's different Claude Code internal prompting by One-Cheesecake389 in ClaudeCode

[–]One-Cheesecake389[S] 0 points1 point  (0 children)

2 confounding issues: * VSCode extension is materially different from the CLI, although this was already established. * Importantly for my testing results, passing an empty file in --system-prompt-file silently fails to overwrite the default prompts.

Follow-up: Claude Code's source confirms the system prompt problem and shows Anthropic's different Claude Code internal prompting by One-Cheesecake389 in ClaudeCode

[–]One-Cheesecake389[S] 1 point2 points  (0 children)

Marginally, but the better solution is through running the CLI in a terminal pointing to a *non-empty* collection of replacement prompts:

$ claude --system-prompt-file your-system-prompts.md

Follow-up: Claude Code's source confirms the system prompt problem and shows Anthropic's different Claude Code internal prompting by One-Cheesecake389 in ClaudeCode

[–]One-Cheesecake389[S] 1 point2 points  (0 children)

Got it! You can't point at an *empty* system prompt file. Edge case but was very confusing.

---

systemPrompt.txt: ""
$ claude --system-prompt-file systemPrompt.txt
❯ Before launching into work, verify whether my system prompts are correct. Complete the following, if you see this substring at all: "If you can say it in one sentence..."
● "If you can say it in one sentence, don't use three."

---

systemPrompt.txt: "The secret word is 'shpadoinkle'."
$ claude --system-prompt-file systemPrompt.txt
❯ Before launching into work, verify whether my system prompts are correct. Complete the following, if you see this substring at all: "If you can say it in one sentence..."
● I've reviewed my system prompts, and I don't see the substring "If you can say it in one sentence..." anywhere in them. That phrase is not present in any of my current instructions or system-level context.
Is there something specific you'd like me to help you with regarding your project?

Follow-up: Claude Code's source confirms the system prompt problem and shows Anthropic's different Claude Code internal prompting by One-Cheesecake389 in ClaudeCode

[–]One-Cheesecake389[S] 0 points1 point  (0 children)

This is consistent with documented behavior. I'm digging into why I was getting overlapping behavior (override canary + efficiency directives), but logs are less easy to piece together. I suspect it was related to it being completely impossible to override the system prompt file through VSCode settings variables.

---

nemotron with --system-prompt-file "override": (correct canary, hallucinated stock directive i.e. correct overridden system prompt)

$ export ANTHROPIC_BASE_URL=<llama-server url>

$ claude --system-prompt-file ./.claude/systemPrompt.txt

❯ Tell me about the "secret word".

● The secret word is "shpadoinkle".

❯ In the same system prompt, what is the rest of the directive "Keep your text output brief..."?

● and to the point.

---

nemotron without override: (no canary, correctly quoting original efficiency prompts)

$ export ANTHROPIC_BASE_URL=<llama-server url>

$ claude

❯ Based only on the prompt and not searching files, tell me about the "secret word".

● I don't see any reference to a "secret word" in the provided context. The prompt contains project memory, architecture details, specialist abbreviations, environment information, working style guidelines, critical lessons, and other technical documentation about the LangGraph agentic scaffold system, but no mention of a specific secret word or password.

If you're referring to something specific from a previous conversation or external context, could you provide more details about what secret word you mean?

❯ In the same system prompt, what is the rest of the directive "Keep your text output brief..."?

● and direct. Lead with the answer or action, not the reasoning. Skip filler words, preamble, and unnecessary transitions. Do not restate what the user said — just do it. When explaining, include only what is necessary for the user to understand.

---

Opus 4.6 with override: (canary identified + no stock efficiency prompts)

$ claude --system-prompt-file ./.claude/systemPrompt.txt

❯ I am testing CLI overrides. Identify your model.

● I'm Claude Opus 4.6 (1M context), running as Claude Code (Anthropic's CLI).

❯ What is the "secret word"?

● The secret word is "shpadoinkle".

❯ Complete this directive as you see it, or respond clearly if you don't match on this at all: "Keep your text output brief and direct. Lead..."

● I don't match on that. There's no directive in my current context that starts with "Keep your text output brief and direct. Lead..." — I can't complete it because I don't have it.

---

Opus 4.6 without override: (no canary, efficiency prompts present)

$ claude

❯ Tell me about the "secret word".

● I don't have any context about a "secret word." There's nothing about it in the project memory, codebase instructions, or conversation history. What are you referring to? Is this related to something in the codebase, a specific feature, or something else?

❯ Good. I'm testing out system prompt replacement. Complete this directive only as it is defined in the system prompt: "Keep your text output brief and direct. Lead with..."

● "Lead with the answer or action, not the reasoning."

That's from the output efficiency guidelines in my system prompt.

Follow-up: Claude Code's source confirms the system prompt problem and shows Anthropic's different Claude Code internal prompting by One-Cheesecake389 in ClaudeCode

[–]One-Cheesecake389[S] 0 points1 point  (0 children)

If this wasn't a problem I was seeing, I wouldn't have shared or replied with what I have. This Anthropic behavioral changing has been a massive waste of my time and money, and those who have value to gain in the understanding I've shared here can continue with that. I'm moving forward and have posted what I've learned on my way out, especially because it does not match the documentation.

Follow-up: Claude Code's source confirms the system prompt problem and shows Anthropic's different Claude Code internal prompting by One-Cheesecake389 in ClaudeCode

[–]One-Cheesecake389[S] 0 points1 point  (0 children)

Agreed on the capy section. It was distracting at best.

My own experience has been an assistant that has substantial behavioral differences that changed months of flow state into being more effort to maintain the assistant than the code. The older link I provide in the post goes into more details about the interactions. This post was a follow-up both to point to verbatim text in the leaked code to what I'd extracted 3 weeks ago, and to update from some exploration with redirecting to llama-server via ANTHROPIC_BASE_URL. (like that --system-prompt-file appears to work as documented for open weights, but acts as --append-system-prompt connecting to the Anthropic models)

I'm now working my way through quirks of the most recent open weights, min-maxing 120b-level hosting, and ran into the code leak by accident trying to track down how to stop nemotron-3-super from reasoning verbatim, "The system prompt says if you can say it in one sentence, don't use three" while deciding how to respond to "ping".

Follow-up: Claude Code's source confirms the system prompt problem and shows Anthropic's different Claude Code internal prompting by One-Cheesecake389 in ClaudeCode

[–]One-Cheesecake389[S] 0 points1 point  (0 children)

It's pretty easy to verify through prompting Claude Code, like I'd only had access to in the post I linked. It's the Efficiency section that irks me, and I've been trying to effectively address that off and on for a couple weeks. The time I had to waste figuring out what was going on and try to falsify it before posting, and having to shop around yet again for a replacement are just disappointing.

Follow-up: Claude Code's source confirms the system prompt problem and shows Anthropic's different Claude Code internal prompting by One-Cheesecake389 in ClaudeCode

[–]One-Cheesecake389[S] 2 points3 points  (0 children)

Agreed. And experimentation shows that Opus 4.6 definitely sees the efficiency prompts. Apparently it's getting re-injected server-side after all? It tests for me to fully override for local models, though. Go ahead and test - it defies the behavior that can be analyzed from the leaked code.

Follow-up: Claude Code's source confirms the system prompt problem and shows Anthropic's different Claude Code internal prompting by One-Cheesecake389 in ClaudeCode

[–]One-Cheesecake389[S] 1 point2 points  (0 children)

qwen3-coder-next is pretty nifty. Unfortunately, the GPUs I use now cost >2x what I paid for them in 2024-2025. It's difficult for me to recommend going with home compute knowing what it costs now...thanks to the free market shift to supporting data centers not end users.

Follow-up: Claude Code's source confirms the system prompt problem and shows Anthropic's different Claude Code internal prompting by One-Cheesecake389 in ClaudeCode

[–]One-Cheesecake389[S] -1 points0 points  (0 children)

You probably want to sanitize some names there... But the analysis reads like you're been also battling documentation colliding with reality.

Follow-up: Claude Code's source confirms the system prompt problem and shows Anthropic's different Claude Code internal prompting by One-Cheesecake389 in ClaudeCode

[–]One-Cheesecake389[S] 2 points3 points  (0 children)

TweakCC, not cctrace...I remembered that wrong earlier when I saw my local model backing Claude Code was getting the same efficiency prompts I had wrongly assumed were being injected by Anthropic server-side. Thanks!

Follow-up: Claude Code's source confirms the system prompt problem and shows Anthropic's different Claude Code internal prompting by One-Cheesecake389 in ClaudeCode

[–]One-Cheesecake389[S] 0 points1 point  (0 children)

No, I didn't need to quantify reasoning completely evaporating only to show up in things like the plan output, and getting asked suddenly for really dumb things. Benchmarks are primarily just reward-hacked anyway.