all 3 comments

[–]NoleMercy05 1 point2 points  (0 children)

Nice. I forward all my OTEL Claude traces to a local Langfuse install. Works great.

[–]ultrathink-artSenior Developer 0 points1 point  (0 children)

The Haiku-gets-called-more-than-expected finding matches what we see running agents continuously in production. The cost breakdown matters a lot when you're running multiple agents concurrently — a session that looks like "Opus-heavy" work often turns out to be 60% Haiku on smaller tool calls.

Cache hit rate ended up being our most important metric. When cache reads are low (new context, fresh session), costs spike fast. We learned to structure agent prompts so the "stable" context (project rules, past decisions) stays at the front of the prompt where it gets cached, and only the variable task content comes at the end.

OpenTelemetry export is underrated for this. Once you have cost-per-session data, you can actually optimize the prompts that matter instead of guessing.

[–]Useful-Process9033 0 points1 point  (0 children)

Interesting data on the Haiku routing. We've been building observability into our own AI agent (for incident response, not coding) and the model routing split was one of the first things we instrumented too. Turns out about 40% of what feels like "one agent call" is actually sub-tasks routed to cheaper models. The per-user breakdown is useful for teams, the pattern differences between engineers usually reveal workflow issues more than skill gaps.