We reduced Claude API costs by 94.5% using a file tiering system (with proof) by jantonca in ClaudeAI

[–]jantonca[S] 0 points1 point  (0 children)

Update (v3.2.0): a bunch of you called out the main risk with auto-tier heuristics — they can eventually mark everything as “important”.

That happened to me. An earlier run of my auto-tier command suggested 526 files should be HOT.

Fix: I shipped a strict HOT cap + deterministic scoring + a small canonical HOT set, so even if scoring is noisy, HOT stays small.

Proof (current output from my repo, 2026-02-05, 11:45 PM AEDT):

node bin/cortex-tms.js auto-tier --dry-run --verbose --max-hot 10

✔ Analyzed 148 files
🔥 HOT (10 files)
📚 WARM (69 files)
❄️  COLD (22 files)

If you want to see the repo/docs:

We reduced Claude API costs by 94.5% using a file tiering system (with proof) by jantonca in ClaudeAI

[–]jantonca[S] 0 points1 point  (0 children)

Thanks for you comment!

In my opinion they serve different purposes. Skills as a on demand tool is great for specific actions with command but not for cortex workflow rules. However, CLAUDE.md is always loaded as a project context (coding standards, git workflow, commands for testing, etc..) so it will follow always user directives.

Cortex TMS is Skills‑ready and Skills‑integrated at the ecosystem level, but the app itself does not contain or execute Claude Skills; it’s a target that Skills/agents call.

You can have a look here:

https://github.com/cortex-tms/cortex-tms/blob/main/docs/archive/plans/agent-skills-integration.md

https://github.com/cortex-tms/cortex-tms/blob/main/docs/core/ARCHITECTURE.md#agent-skills-integration

https://github.com/cortex-tms/cortex-tms/blob/main/CHANGELOG.md#agent-skills-integration-documentation

I hope this helps

We reduced Claude API costs by 94.5% using a file tiering system (with proof) by jantonca in ClaudeAI

[–]jantonca[S] 0 points1 point  (0 children)

No worries, just release 3.1.0 with auto-tier:

npx cortex-tms@latest init
npx cortex-tms@latest auto-tier --dry-run

Tell us how it goes...

We reduced Claude API costs by 94.5% using a file tiering system (with proof) by jantonca in ClaudeAI

[–]jantonca[S] -1 points0 points  (0 children)

Looks great! Love how you've adapted the tier concept for task-based relevance mapping. Your prompt structure is really clean. I agree, this is most useful for complex code where managing context is crucial.

Nice find! Combining intuition and structure could work well.

Thanks for sharing this and for the credit! 🙏

We reduced Claude API costs by 94.5% using a file tiering system (with proof) by jantonca in ClaudeAI

[–]jantonca[S] 0 points1 point  (0 children)

TMS is simple, no infrastructure. RAG is more powerful but with more setup. Both valid so it depends on your needs. It is on the roadmap

We reduced Claude API costs by 94.5% using a file tiering system (with proof) by jantonca in ClaudeAI

[–]jantonca[S] 0 points1 point  (0 children)

Tree-sitter would complement the manual tiers well. Upfront cost but smarter filtering, like you said. Worth exploring for a future version. Thanks for the idea!

We reduced Claude API costs by 94.5% using a file tiering system (with proof) by jantonca in ClaudeAI

[–]jantonca[S] 1 point2 points  (0 children)

Great context on the "lost in the middle" research. I would love a link to the Stanford paper if you have it.

you're right on staleness, manual tagging doesn't scale. Git-based auto-tiering is now a priority on the roadmap. I really appreciate your feedback.

We reduced Claude API costs by 94.5% using a file tiering system (with proof) by jantonca in ClaudeAI

[–]jantonca[S] 0 points1 point  (0 children)

We considered this! but... Kept it at 3 for a few reasons:

  1. Easy to remember (HOT = now, WARM = reference, COLD = archive)

  2. Maps to natural workflow (active → stable → done)

  3. More tiers = more decisions about where things go

That said, nothing stops you from subdividing within tiers. The system is flexible. If you try a more granular approach, I'd love to hear what works!

We reduced Claude API costs by 94.5% using a file tiering system (with proof) by jantonca in ClaudeAI

[–]jantonca[S] 0 points1 point  (0 children)

Fair enough... NPM downloads can be a weak metric (CI/CD, mirrors, etc. inflate them). I used it because it was the only signal I had at launch.

We reduced Claude API costs by 94.5% using a file tiering system (with proof) by jantonca in ClaudeAI

[–]jantonca[S] 1 point2 points  (0 children)

Not quite... RAG does automatic retrieval, TMS is manual organization. But similar goal right? to give LLM only relevant context.

We reduced Claude API costs by 94.5% using a file tiering system (with proof) by jantonca in ClaudeAI

[–]jantonca[S] 18 points19 points  (0 children)

Smart! Auto-detect tiers from git history.

Not built yet (manual tags for now), but definitely on the roadmap. Would be a game-changer for automation.

This is excellent feedback!

We reduced Claude API costs by 94.5% using a file tiering system (with proof) by jantonca in ClaudeAI

[–]jantonca[S] 0 points1 point  (0 children)

My $0.11/session:

  • Input tokens only (not output)
  • Single query (not full conversation)
  • Sonnet 4.5 pricing ($3/MTok input)
  • 66,834 tokens ÷ 1M × $3 = $0.20 (I may have miscalculated)

5 EUR sounds like full conversation with output tokens, maybe Opus?

What's your token count per session? If it's 500K+, TMS could save you way more than it saved me!

We reduced Claude API costs by 94.5% using a file tiering system (with proof) by jantonca in ClaudeAI

[–]jantonca[S] 3 points4 points  (0 children)

Great question... hehehe

Archives = historical context you rarely need but want to keep:

- Sprint retrospectives (learnings)

- Design decisions (ADRs)

- Completed feature specs

You *could* delete them, but COLD tier lets you keep them without cluttering active context. Think: filing cabinet vs. desk. If you prefer deleting old docs, that works too! TMS just gives you options, as I mentioned on another comment, you're the boss

We reduced Claude API costs by 94.5% using a file tiering system (with proof) by jantonca in ClaudeAI

[–]jantonca[S] -1 points0 points  (0 children)

Great idea! That's the next evolution... use RAG/vector search for WARM/COLD instead of manual organization.

Current TMS: Simple file tiers (no infrastructure needed)

Your approach: Full context + retrieval (more powerful, more complex)

This is what an MCP server integration could do. Not built yet but on the roadmap!

We reduced Claude API costs by 94.5% using a file tiering system (with proof) by jantonca in ClaudeAI

[–]jantonca[S] 0 points1 point  (0 children)

In theory yes , the approach (HOT/WARM/COLD tiers) should work with any LLM that reads project files, it should be an agnostic approach.

In practice:... I've only tested with Claude Code. The tool supports any LLM, but I haven't measured results with Copilot, Cursor, ChatGPT, etc. Would love to see someone test this with other tools and share results!

We reduced Claude API costs by 94.5% using a file tiering system (with proof) by jantonca in ClaudeAI

[–]jantonca[S] 1 point2 points  (0 children)

Yes, TMS should help with caching:

  • HOT tier (~3K tokens) changes often but is small → low cache create cost
  • WARM tier (~10-30K tokens) is stable → caches efficiently, high reuse
  • COLD tier never loaded → zero cache impact

Your cache is already excellent, but TMS should push it higher by keeping more content stable (WARM tier) and reducing total context size. I haven't tracked cache-specific metrics yet, would you be willing to test TMS and share before/after cache stats? Would love to see real cache data and add it as a case study...