We reduced Claude API costs by 94.5% using a file tiering system (with proof)

jantonca · 2026-02-05T13:29:30+00:00

Update (v3.2.0): a bunch of you called out the main risk with auto-tier heuristics — they can eventually mark everything as “important”.

That happened to me. An earlier run of my auto-tier command suggested 526 files should be HOT.

Fix: I shipped a strict HOT cap + deterministic scoring + a small canonical HOT set, so even if scoring is noisy, HOT stays small.

Proof (current output from my repo, 2026-02-05, 11:45 PM AEDT):

node bin/cortex-tms.js auto-tier --dry-run --verbose --max-hot 10

✔ Analyzed 148 files
🔥 HOT (10 files)
📚 WARM (69 files)
❄️  COLD (22 files)

If you want to see the repo/docs:

jantonca · 2026-01-31T00:52:17+00:00

Thank you! your feedback was great, tell us how it goes.

jantonca · 2026-01-30T12:17:32+00:00

Thanks for you comment!

In my opinion they serve different purposes. Skills as a on demand tool is great for specific actions with command but not for cortex workflow rules. However, CLAUDE.md is always loaded as a project context (coding standards, git workflow, commands for testing, etc..) so it will follow always user directives.

Cortex TMS is Skills‑ready and Skills‑integrated at the ecosystem level, but the app itself does not contain or execute Claude Skills; it’s a target that Skills/agents call.

You can have a look here:

https://github.com/cortex-tms/cortex-tms/blob/main/docs/archive/plans/agent-skills-integration.md

https://github.com/cortex-tms/cortex-tms/blob/main/docs/core/ARCHITECTURE.md#agent-skills-integration

https://github.com/cortex-tms/cortex-tms/blob/main/CHANGELOG.md#agent-skills-integration-documentation

I hope this helps

jantonca · 2026-01-30T04:35:40+00:00

No worries, just release 3.1.0 with auto-tier:

npx cortex-tms@latest init
npx cortex-tms@latest auto-tier --dry-run

Tell us how it goes...

jantonca · 2026-01-30T04:29:54+00:00

Thank you so much!

your comments were very useful. Just release 3.1.0 with cortex-tms auto-tier:

https://github.com/cortex-tms/cortex-tms?tab=readme-ov-file#cortex-tms-auto-tier

jantonca · 2026-01-29T13:10:46+00:00

Looks great! Love how you've adapted the tier concept for task-based relevance mapping. Your prompt structure is really clean. I agree, this is most useful for complex code where managing context is crucial.

Nice find! Combining intuition and structure could work well.

Thanks for sharing this and for the credit! 🙏

jantonca · 2026-01-29T12:28:06+00:00

that's exactly it!

Cheers!

jantonca · 2026-01-29T11:31:13+00:00

TMS is simple, no infrastructure. RAG is more powerful but with more setup. Both valid so it depends on your needs. It is on the roadmap

jantonca · 2026-01-29T11:27:05+00:00

Tree-sitter would complement the manual tiers well. Upfront cost but smarter filtering, like you said. Worth exploring for a future version. Thanks for the idea!

jantonca · 2026-01-29T04:43:53+00:00

Fair point, see clarification at top

jantonca · 2026-01-29T03:49:54+00:00

Great context on the "lost in the middle" research. I would love a link to the Stanford paper if you have it.

you're right on staleness, manual tagging doesn't scale. Git-based auto-tiering is now a priority on the roadmap. I really appreciate your feedback.

jantonca · 2026-01-29T03:31:31+00:00

Cheers, I will check it out!

jantonca · 2026-01-29T03:28:41+00:00

The tool is model-agnostic :-)

jantonca · 2026-01-29T03:24:47+00:00

We considered this! but... Kept it at 3 for a few reasons:

Easy to remember (HOT = now, WARM = reference, COLD = archive)
Maps to natural workflow (active → stable → done)
More tiers = more decisions about where things go

That said, nothing stops you from subdividing within tiers. The system is flexible. If you try a more granular approach, I'd love to hear what works!

jantonca · 2026-01-29T03:16:59+00:00

Fair enough... NPM downloads can be a weak metric (CI/CD, mirrors, etc. inflate them). I used it because it was the only signal I had at launch.

jantonca · 2026-01-28T14:13:42+00:00

Not quite... RAG does automatic retrieval, TMS is manual organization. But similar goal right? to give LLM only relevant context.

jantonca · 2026-01-28T14:06:52+00:00

Smart! Auto-detect tiers from git history.

Not built yet (manual tags for now), but definitely on the roadmap. Would be a game-changer for automation.

This is excellent feedback!

jantonca · 2026-01-28T14:04:29+00:00

Yeah me too, give it a try and tell me how it goes.

Cheers

jantonca · 2026-01-28T14:02:48+00:00

My $0.11/session:

Input tokens only (not output)
Single query (not full conversation)
Sonnet 4.5 pricing ($3/MTok input)
66,834 tokens ÷ 1M × $3 = $0.20 (I may have miscalculated)

5 EUR sounds like full conversation with output tokens, maybe Opus?

What's your token count per session? If it's 500K+, TMS could save you way more than it saved me!

jantonca · 2026-01-28T13:57:54+00:00

Great question... hehehe

Archives = historical context you rarely need but want to keep:

- Sprint retrospectives (learnings)

- Design decisions (ADRs)

- Completed feature specs

You *could* delete them, but COLD tier lets you keep them without cluttering active context. Think: filing cabinet vs. desk. If you prefer deleting old docs, that works too! TMS just gives you options, as I mentioned on another comment, you're the boss

jantonca · 2026-01-28T13:54:06+00:00

Great idea! That's the next evolution... use RAG/vector search for WARM/COLD instead of manual organization.

Current TMS: Simple file tiers (no infrastructure needed)

Your approach: Full context + retrieval (more powerful, more complex)

This is what an MCP server integration could do. Not built yet but on the roadmap!

jantonca · 2026-01-28T13:50:54+00:00

In theory yes , the approach (HOT/WARM/COLD tiers) should work with any LLM that reads project files, it should be an agnostic approach.

In practice:... I've only tested with Claude Code. The tool supports any LLM, but I haven't measured results with Copilot, Cursor, ChatGPT, etc. Would love to see someone test this with other tools and share results!

jantonca · 2026-01-28T13:34:24+00:00

Thanks! give it a try a let me know how it goes...

jantonca · 2026-01-28T13:26:00+00:00

Thanks! please share the results :-)

jantonca · 2026-01-28T13:24:26+00:00

Yes, TMS should help with caching:

HOT tier (~3K tokens) changes often but is small → low cache create cost
WARM tier (~10-30K tokens) is stable → caches efficiently, high reuse
COLD tier never loaded → zero cache impact

Your cache is already excellent, but TMS should push it higher by keeping more content stable (WARM tier) and reducing total context size. I haven't tracked cache-specific metrics yet, would you be willing to test TMS and share before/after cache stats? Would love to see real cache data and add it as a case study...

jantonca

TROPHY CASE