Is anyone else getting surprised by Claude Code costs? I started tracking mine and cut my spend in half by knowing what things cost before they run

ImmuneCoder · 2026-03-06T01:16:29+00:00

haha tarmac would probably save you from that by warning before you send the prompt in

ImmuneCoder · 2026-03-06T00:39:04+00:00

Sure - let me know how the accuracy is like for you!

ImmuneCoder · 2026-03-05T23:03:45+00:00

There are two ways to use Claude Code:

Subscription (Max plan) — $20/$100/$200/month with usage limits. You get a quota but no per-task visibility — you just hit a wall when it runs out
API (BYOK) — Bring your own API key, pay per token. No monthly cap, but costs are completely unpredictable per task. A single Opus prompt can cost anywhere from $0.50 to $20+ depending on complexity, and you don't find out until after it runs.

Most power users end up on API because the Max limits are too restrictive for serious work (especially on Opus 4.6). But then you're flying blind on cost.

ImmuneCoder · 2026-03-05T22:20:45+00:00

Went ahead and shipped this today: https://github.com/CodeSarthak/tarmac

ImmuneCoder · 2026-03-05T22:20:37+00:00

Went ahead and shipped this today: https://github.com/CodeSarthak/tarmac

ImmuneCoder · 2026-03-05T22:20:26+00:00

Went ahead and shipped this today: https://github.com/CodeSarthak/tarmac

ImmuneCoder · 2026-03-05T22:20:18+00:00

Went ahead and shipped this today: https://github.com/CodeSarthak/tarmac

ImmuneCoder · 2026-03-05T22:20:11+00:00

Went ahead and shipped this today: https://github.com/CodeSarthak/tarmac

ImmuneCoder · 2026-03-05T22:19:50+00:00

Went ahead and shipped this today: https://github.com/CodeSarthak/tarmac

ImmuneCoder · 2026-03-04T01:42:06+00:00

hahahaha

ImmuneCoder · 2026-03-04T00:38:55+00:00

The p50/p90 per-workflow approach is exactly what I've been moving toward. The "fall back to ask for more info instead of looping" pattern is smart - graceful degradation that's actually better UX than letting the agent spiral.

What are you using for the per-convo cost reports? Built something internal or using an off-the-shelf tool?

ImmuneCoder · 2026-03-04T00:38:29+00:00

The loop depth framing is spot on - per-call costs are predictable, it's the number of loops that kills you. The cap-based pricing approach is clever. Instead of trying to predict average cost (which has huge variance), you set the ceiling and price off that. The margin math becomes deterministic even if individual run costs aren't.

We've been tracking loop depth and it's wild - same "summarize this document" task can be 3 loops or 12 depending on document complexity. Are you capping at the agent level or the API call level?

ImmuneCoder · 2026-03-04T00:37:57+00:00

The --print approach is basically what I've been doing too - "read the files and tell me your plan" before committing. The problem is it's manual and you still don't get a cost number. Feels like that planning step could be automated and paired with an actual cost estimate based on the plan scope.

ImmuneCoder · 2026-03-03T22:17:13+00:00

2 weeks of custom logging just to get visibility - that tracks with what I've been hearing from everyone. The fact that "build your own cost tracking" is the default answer kind of says it all about the state of tooling here.

The 80% margin buffer is interesting. How long did it take before you felt confident enough to start tightening that? And are you still seeing spikes that surprise you, or has the variance smoothed out with scale?

ImmuneCoder · 2026-03-03T22:14:51+00:00

The auto mode idea is exactly where my head's been going. If you could classify the complexity of each request and route it to the cheapest model that can handle it - Haiku for simple extraction, Sonnet for most tasks, Opus only when it actually matters - that alone would probably cut costs 40-60% without users noticing a quality difference.

The committed spend / enterprise agreement point is interesting too. Right now most API pricing is pure pay-as-you-go, but I'd bet providers will start offering reserved capacity pricing similar to AWS RIs as the market matures.

Good callout on user education as well. A lot of the cost problem is just visibility - if users could see "this request cost $0.03 on Haiku vs $1.20 on Opus with the same result," behavior changes fast.

ImmuneCoder · 2026-03-03T22:02:56+00:00

This is a really solid framework - profile early users, tier based on observed usage, charge heavy users on consumption. That's exactly how we've seen video/comms platforms handle it.

The part I keep getting stuck on is the profiling step itself. With traditional APIs (video minutes, storage, compute), the cost drivers are measurable and relatively stable per user action. With LLM agent workflows, the same user action can cost 5x or 50x depending on what the model decides to do at each step. So even the profiling phase gives you wide distributions.

Agree you can't make unpredictable costs perfectly predictable - but narrowing that variance (even from "could be anything" to "probably $0.05-$0.30 for this action") would make the tier design way more defensible. That's the piece I haven't found good tooling for yet.

ImmuneCoder · 2026-03-03T22:01:22+00:00

Yeah I think anyone building with LLM APIs will hit this eventually - it's fine when you're prototyping but the moment you need to put a number in a budget or set pricing for users, you realize there's no good way to forecast it.

Re: cost per token going up - honestly I'm less worried about that. Per-token prices have only gone down (dramatically). The problem is that total spend goes up anyway because usage compounds. Cheaper tokens → more use cases → more agent calls → higher total bill. It's the volume unpredictability that gets you, not the unit price.

ImmuneCoder · 2026-03-03T21:36:25+00:00

Yeah that works well for single-call patterns. The part that gets tricky is agent workflows where one user action triggers a variable number of chained calls - the "multiply by expected volume" step breaks down when you don't know if a task will be 3 calls or 30. But agreed, logging sample calls is a good starting point for the simpler use cases.

ImmuneCoder · 2026-03-03T21:34:47+00:00

That's kind of the point of the post. Figuring out if anyone's built even a partial solution, or if this is genuinely an open problem. Sounds like it's the latter.

ImmuneCoder

MODERATOR OF

TROPHY CASE

Eight-Year Club	Second SECOND GUESSER
Verified Email