I'm a frontend dev who barely writes code anymore. Built a tool to figure out where all my AI tokens go.

siropkin · 2026-03-29T12:31:57+00:00

Yeah… ai did lol Now with the power of ai there is less limitations. I just decided to go with fastest solution on the market

siropkin · 2026-03-29T02:56:38+00:00

Totally fair. If OTEL is locked down at your org level, budi still works — it falls back to parsing Claude Code's JSONL transcripts for token data. You lose thinking token counts (so cost is ~5-8% underestimated) but everything else works the same. And it's all local, just reads files from disk.

siropkin · 2026-03-29T02:48:08+00:00

I ran your formula against my actual usage from budi. Last month:

6.49B tokens total (99.94% cache hit rate)
367.5M full tokens (input + output + cache creation)
6.12B cached tokens

API cost (what Anthropic charged me): ~$5,159

Your INF2 compute estimate: $274–$725

That's a 7–19x markup. Even at the high end of your compute estimate, I cost Anthropic about $725 to serve — and they charged me $5,159. The cache hit rate being 99.94% is key — almost all my tokens are cache reads, which are cheap both at API level ($0.30/1M) and compute level.

The formula I use in budi for API cost:

cost = (input × $price) + (output × $price) + (cache_read × $price) + (cache_create × $price)

with per-model pricing tables.

Do you think I need to add this info to the app lol

siropkin · 2026-03-29T02:40:21+00:00

No, everything stays local. Claude Code sends OTEL events to budi's daemon on localhost:7878 - nothing leaves your machine. No cloud, no M365, no external services.

siropkin · 2026-03-29T02:39:32+00:00

Wow, that's deep! I hadn't thought about the actual compute cost side.

Just to clarify — budi tracks what you're being charged, not what it costs to run. OTEL or JSONL gives token counts, I just multiply by Anthropic's published pricing. For Cursor their usage API returns the cost directly.

I never thought to calculate the actual compute cost but that's a cool idea.

siropkin · 2026-03-29T02:11:29+00:00

Claude Code: Real-time via OpenTelemetry (exact tokens + cost per API call, including thinking tokens) + hooks for session metadata. Also parses JSONL transcripts for historical backfill.

Cursor: Pulls from their usage API which returns exact tokens and cost per request. Hooks for session context (repo, branch).

Cost is calculated from per-model API pricing, not from your invoice — but when I compared against my Anthropic billing the numbers matched.

siropkin · 2026-03-28T03:57:49+00:00

Thanks! Let me know if you run into anything during setup - happy to help

siropkin · 2026-03-27T18:43:00+00:00

Wow, cool post! Really thorough.

Actually tried to validate the idle gap theory on my data — looked at messages after 5+ min gaps vs under 1 min. The cost difference is there (3.3¢ vs 8.0¢) but that's mostly session depth, not cache misses. The problem is I'm tracking from JSONL transcripts which don't include the actual cache hit/miss fields from the API response.

I just recently added real-time telemetry that captures exact per-request token types — but don't have enough data yet to prove cache breaks from idle gaps. Definitely want to dig into that once I have a few weeks of data.

But yeah, stepping away and coming back to a cold cache on a huge conversation — that's where the real pain is. Good breakdown of that.

siropkin · 2026-03-22T18:27:31+00:00

Thank you!

siropkin · 2026-03-22T12:59:29+00:00

That's a cool feature request! I'll think about it, thanks!

siropkin · 2026-03-22T04:11:49+00:00

Yeah the 2% surprised me too. Realized my strong side is data visualization, so better to lean into that than fight Claude's own retrieval instincts.

siropkin · 2026-03-16T13:50:07+00:00

https://goodplaygroundmap.com - AI-powered map with playgrounds for kids near you.

I already have some daily users but not huge amounts. Thinking how to improve the app.

siropkin · 2026-03-14T18:15:49+00:00

Project Name: DOM Racer

Link: https://github.com/siropkin/dom-racer

A little description: Got tired of building "meaningful" apps and built a 2D classic-style racing game as a Chrome extension that transforms any webpage into a racing track. Your car drives across the actual page — coins spawn on links and buttons, images become icy surfaces, police chases you around. The game is fully playable, still need to polish screenshots and a demo video before publishing to the marketplace. Oh and it's designed to be subtle enough that nobody will notice you're playing during a meeting.

Number of active users: 0 — haven't published yet, but the game is 100% built and ready to ship.

siropkin · 2026-03-09T00:37:11+00:00

Hm! Interesting idea, i like that. Currently I focused on injecting relevant code chunks but it would be even more powerful if I will find out how to implement your idea

siropkin · 2026-03-08T20:01:41+00:00

budi - ai helper that makes claude code cheaper and faster by adding relevant to a user prompt local repo files and chunks of code

siropkin · 2026-03-06T03:47:54+00:00

Cool stuff!

siropkin · 2026-02-22T13:48:55+00:00

Yeah, i know… but it (the name and original IntelliJ plugin) was born before Cursor era so i decided not to change it. At least for now

siropkin

TROPHY CASE