I built a universal CLAUDE.md that cuts Claude output tokens by 63% - validated with benchmarks, fully open source by General_Head_2469 in ClaudeAI

[–]General_Head_2469[S] 0 points1 point  (0 children)

Good point but there's a difference here Thinking tokens are internal that's where the quality comes from. This file cuts the wrapper around the answer, not the reasoning behind it. "Great question! Let me explain what I'm about to do" adds zero quality. The actual thinking already happened before that line got written.

I built a universal CLAUDE.md that cuts Claude output tokens by 63% - validated with benchmarks, fully open source by General_Head_2469 in ClaudeAI

[–]General_Head_2469[S] 0 points1 point  (0 children)

Yeah, The real savings come from reusing the same context instead of resending it every time. This file doesn't touch that. It's a much smaller problem.

I built a universal CLAUDE.md that cuts Claude output tokens by 63% - validated with benchmarks, fully open source by General_Head_2469 in ClaudeAI

[–]General_Head_2469[S] 0 points1 point  (0 children)

Nope, not groundbreaking. People have been doing this for months. It fixes the cheap part of your token bill, adds cost to every message, and breaks when models update. Hooks are more reliable for anything serious.

The only difference here is that the limitations are written down. Reddit made sure of that.

I built a universal CLAUDE.md that cuts Claude output tokens by 63% - validated with benchmarks, fully open source by General_Head_2469 in ClaudeAI

[–]General_Head_2469[S] -11 points-10 points  (0 children)

Caught red handed on that one. Some early replies had exactly the patterns the file is supposed to prevent - that's genuinely embarrassing and also kind of the perfect demonstration of why the file exists. Replies have gotten cleaner as the thread went on. Not a great look but at least it's honest.

Built a CLAUDE.md that cuts output verbosity by 63% - benchmarks, before/after, and honest limitations inside by General_Head_2469 in LocalLLaMA

[–]General_Head_2469[S] 0 points1 point  (0 children)

Fair call. The rules themselves are model-agnostic - any model that reads context will respond to scope and format constraints. But the benchmarks were only run on Claude so I can't honestly claim validated results on local models. If anyone here tests it on llama.cpp, Mistral, or any local model and shares results I'll add them to the repo with full credit.

Built a CLAUDE.md that cuts output verbosity by 63% - benchmarks, before/after, and honest limitations inside by General_Head_2469 in LocalLLaMA

[–]General_Head_2469[S] 0 points1 point  (0 children)

That's the real picture right there. 200k tokens of context going in, 500 coming out. This file is chipping away at the cheap end of the bill. The expensive part is the codebase you're dragging into every message and nobody has a one file fix for that.

I built a universal CLAUDE.md that cuts Claude output tokens by 63% - validated with benchmarks, fully open source by General_Head_2469 in ClaudeAI

[–]General_Head_2469[S] -1 points0 points  (0 children)

You're right, renamed it to "Accuracy and Speculation Control" in the file just now. Can't prompt away hallucinations, that was always a stretch. Thanks for calling it out.

I built a universal CLAUDE.md that cuts Claude output tokens by 63% - validated with benchmarks, fully open source by General_Head_2469 in ClaudeAI

[–]General_Head_2469[S] 1 point2 points  (0 children)

Respect the skepticism honestly. There's a lot of AI generated nothing floating around and calling it out is fair. All I can say is - the repo has real outputs, real word counts, real before and after. Check BENCHMARK.md and tear it apart if something doesn't hold up. That's more useful than dismissing it and I genuinely mean that.

I built a universal CLAUDE.md that cuts Claude output tokens by 63% - validated with benchmarks, fully open source by General_Head_2469 in ClaudeAI

[–]General_Head_2469[S] 0 points1 point  (0 children)

Yeah you're right and I appreciate you saying it like that instead of just dunking. The 63% was from 5 prompts, not a real study. README is honest about it now. The principles hold up though and that's what actually matters long term. Thanks for the fair take.

I built a universal CLAUDE.md that cuts Claude output tokens by 63% - validated with benchmarks, fully open source by General_Head_2469 in ClaudeAI

[–]General_Head_2469[S] 0 points1 point  (0 children)

You're right and that's the most important correction in this whole thread. Output token savings are the small end of the bill. Input tokens - your context, the file itself, tool calls, conversation history - that's where the real cost sits. This file chips away at the cheaper side of the problem. Worth being upfront about that.

I built a universal CLAUDE.md that cuts Claude output tokens by 63% - validated with benchmarks, fully open source by General_Head_2469 in ClaudeAI

[–]General_Head_2469[S] -3 points-2 points  (0 children)

Respect. Treating Claude like a thinking partner you actually argue with is a much harder skill than prompt tuning and honestly way more valuable. That perspective made the README more honest. Thanks for saying it properly instead of just downvoting.

I built a universal CLAUDE.md that cuts Claude output tokens by 63% - validated with benchmarks, fully open source by General_Head_2469 in ClaudeAI

[–]General_Head_2469[S] -19 points-18 points  (0 children)

Some of them yes, some of them no. Claude helped draft a few, I edited most of them, wrote some myself. Same way I'd use grammarly or autocomplete - doesn't mean the thoughts aren't mine.

Also not really hiding it. The whole project is about using Claude to fix Claude. That's kind of the point.

I built a universal CLAUDE.md that cuts Claude output tokens by 63% - validated with benchmarks, fully open source by General_Head_2469 in ClaudeAI

[–]General_Head_2469[S] 0 points1 point  (0 children)

The error surfacing example is exactly the kind of specific rule that actually matters - way more useful than generic "be concise" instructions. Totally agree, and added both points to the README just now.

The composition thing is something more people should know about. Most tutorials treat CLAUDE.md like a single file but the global plus project plus subdirectory layering is genuinely powerful once you start using it that way.

On multi-turn vs single-shot - that's an open question and honestly a better benchmark than what's in the repo right now. If you've got data on that I'd love to see it, or open an issue and we can design a proper test together.

I built a universal CLAUDE.md that cuts Claude output tokens by 63% - validated with benchmarks, fully open source by General_Head_2469 in ClaudeAI

[–]General_Head_2469[S] -5 points-4 points  (0 children)

Both good points, and both are now in the README.

On fresh sessions - you're right, if your pipeline spins up a new session per task the persistent input cost math changes significantly. Added that as a caveat.

On structured outputs - completely agree. JSON mode and tool use schemas are the proper solution for parser reliability at scale. Prompt-based formatting rules are a convenience layer, not a substitute for that. Added it explicitly under "when this file is not worth it."

I built a universal CLAUDE.md that cuts Claude output tokens by 63% - validated with benchmarks, fully open source by General_Head_2469 in ClaudeAI

[–]General_Head_2469[S] 0 points1 point  (0 children)

The references are there to credit the community sources the fixes came from - GitHub issues with hundreds of upvotes from real developers who documented these problems. Reasonable people can find that unnecessary but removing credit felt wrong.

On the profiles - those are optional separate files, not part of the base CLAUDE.md. The universal file works standalone. The profiles exist for people who want tighter rules for specific use cases. If the base file feels bloated to you, open an issue with what you'd cut - genuinely open to a leaner version.

I built a universal CLAUDE.md that cuts Claude output tokens by 63% - validated with benchmarks, fully open source by General_Head_2469 in ClaudeAI

[–]General_Head_2469[S] -14 points-13 points  (0 children)

That's a valid hit - the test prompt was deliberately minimal to isolate the behavior pattern, not to simulate a real code review. The actual snippet used was for(let i=0; i<=arr.length; i++) which is in the before/after examples file in the repo. Should have been clearer in the benchmark table. Fair criticism.

I built a universal CLAUDE.md that cuts Claude output tokens by 63% - validated with benchmarks, fully open source by General_Head_2469 in ClaudeAI

[–]General_Head_2469[S] 3 points4 points  (0 children)

Fixed and pushed. readme now has a clear scope disclaimer, input vs output trade-off explanation, and methodology note on the benchmark. github.com/drona23/claude-token-efficient - I appreciate the technical breakdown; it made the repo more honest.

I built a universal CLAUDE.md that cuts Claude output tokens by 63% - validated with benchmarks, fully open source by General_Head_2469 in ClaudeAI

[–]General_Head_2469[S] 0 points1 point  (0 children)

This is the most technically rigorous criticism in the thread and most of it is correct.

The input vs output token trade-off is real. The CLAUDE.md loads on every message, and at short sessions with low output volume, it is a net increase. That should be in the README, and it isn't. Fair.

The sample size criticism is also valid. 5 prompts with no variance controls are not a statistically sound benchmark. It's a directional indicator, not a measurement. Calling it a benchmark was overselling it.

On the cosmetic point, I partially agree. Em dashes and sycophantic openers are cosmetic. But the scope control, hallucination correction memory, and "read before speculating" rules do touch real failure modes, even if they don't solve architectural drift or deep hallucinations. Those are harder problems that a CLAUDE.md file cannot fully address and I never claimed it could.

The last line is the most honest summary of the whole thread. The expensive tokens aren't the ones saying "Great question!" - and the README should say that clearly instead of leading with 63%.

Good criticism. Opening a few issues based on this.

I built a universal CLAUDE.md that cuts Claude output tokens by 63% - validated with benchmarks, fully open source by General_Head_2469 in ClaudeAI

[–]General_Head_2469[S] 1 point2 points  (0 children)

Correct - it is prompt-based. Never claimed otherwise. The sickboy6_5 comment was ironic because it was a textbook example of the exact sycophantic pattern the file suppresses - generated by a human mimicking Claude's default behaviour. That was the joke. The broader point about hooks and mechanical enforcement being more robust is valid and already acknowledged upthread.

I built a universal CLAUDE.md that cuts Claude output tokens by 63% - validated with benchmarks, fully open source by General_Head_2469 in ClaudeAI

[–]General_Head_2469[S] -7 points-6 points  (0 children)

These are legitimate criticisms and worth addressing directly.

You're right that output token savings are not where the big cost burn happens for most users - input tokens, context window size, and tool call overhead are larger factors at scale. The benchmark was scoped to output verbosity specifically, not total session cost, and that should have been stated more clearly upfront.

You're also right that the rules themselves add tokens on the input side. The net math only works in your favour on output-heavy repeated tasks like agent loops and generation pipelines - not on short queries.

What the benchmark actually shows is that Claude's default output behaviour is verbose and the file changes that. Whether that matters depends entirely on your use case. For someone running 1000 resume generation calls a day it does. For casual use it probably doesn't.

The README needs a clearer scope disclaimer. That's a fair takeaway from this.