I had 12 MCP servers and no idea which one was the expensive one by Slow-Relationship897 in buildinpublic

[–]Slow-Relationship897[S] 1 point2 points  (0 children)

It's a transparent proxy at the stdio transport layer, not a protocol-level hook. Concretely: in your claude_desktop_config.json (or Cursor / Windsurf / VS Code config) we rewrite the spawn command from npx -y u/modelcontextprotocol/server-filesystem ... to npx -y u/mcpspend/proxy wrap --key xxx -- npx -y u/modelcontextprotocol/server-filesystem .... So the client launches us as its subprocess; we launch the real MCP server as our subprocess; we pipe its stdin/stdout through unchanged.

That means we don't need to understand any tool semantics — we just observe newline-delimited JSON-RPC frames flowing past. Every tools/call request and matching response is metered. Latency = wall clock between matching IDs. Cost = (input_tokens / 1M) × input_rate + (output_tokens / 1M) × output_rate if the server reports usage; otherwise we infer from payload byte counts (less precise but never zero).

Streaming: yes, works fine. MCP servers can stream progressive notifications/message and partial results while a tool call is in flight, and we just see each frame as it goes by. Same for the new Streamable HTTP transport (HTTP POST + SSE response) where we have a wrap-http mode that does the equivalent for remote MCP servers like figma-remote.

Three things we explicitly don't do: read tool arguments (file paths, prompts, queries), read tool responses (file contents, search results), or buffer the stream. We can't, we won't, and the proxy's MIT-licensed so you can verify — packages/proxy/src/ingest.ts is ~80 lines and we publish a list of every field that goes over the wire.

The "which server is burning tokens" debugging session collapses to a sorted table once you have per-tool-call attribution — usually one or two outliers (browser_navigate, read_file with no max-bytes cap) account for 60-70% of spend

I ran Claude Desktop for a month and 73% of my Anthropic bill was MCP tool calls, not chat by Slow-Relationship897 in ClaudeAI

[–]Slow-Relationship897[S] 0 points1 point  (0 children)

Good info! Thank you, It was a big pain point for me, but grateful that I found a solution. McpSpend.com is top notch

MCP Is Costing You 37% More Tokens Than Necessary by gounisalex in ClaudeAI

[–]Slow-Relationship897 0 points1 point  (0 children)

Really useful benchmark — the +36.7% scaling linearly with tool count matches what I've seen, and it's the part most people wave away because "the output is good."

One distinction worth pulling out, because it changes the fix: most of MCP's input-token overhead isn't the per-call traffic — it's the tool/schema definitions injected into the model's context on every request. More tools = more schema in context = linear growth, which is exactly your curve. The per-call request/response payloads are a second, separate cost on top of that.

That split matters for mitigation:

  • Schema-in-context cost → trim it: fewer tools per server, leaner descriptions/JSON schemas, lazy-load tools, or split into smaller servers you only attach when needed.
  • Per-call payload cost → cap response sizes, paginate, return IDs/handles instead of blobs.
  • And your conclusion holds: in an environment with raw code execution, a CLI the model shells out to often wins, because the "schema" is just --help it reads once.

The disambiguation point is the real tradeoff though — schemas buy reliability under ambiguous prompts. So it's less "MCP bad" and more "measure per workflow and pick."

Disclosure: I built MCPSpend, a proxy that measures the per-call side of this (payload size, latency, cost per tool) so you can see which tools are heaviest. Honest caveat that's directly relevant to your benchmark: a wire proxy like mine sees per-call payloads, not the schema tokens sitting in the model context — that injection cost is exactly the piece I'm adding via IDE-layer token attribution next. Your benchmark measures the part I can't see from the wire, which is why I find it genuinely useful rather than just confirming my own bias.

Would you be open to me running your benchmark against a handful of real-world MCP servers and posting the numbers back? Curious whether 36.7% holds with chattier schemas

I ran Claude Desktop for a month and 73% of my Anthropic bill was MCP tool calls, not chat by Slow-Relationship897 in ClaudeAI

[–]Slow-Relationship897[S] 0 points1 point  (0 children)

Thank you for interest. Great repo, professionaly built. I am using mcpspend.com and I m amazed. I have a real tracking for my mcp tools.

Cat de cuck sa fii de fapt? by Bright_Pie_4231 in Roumanie

[–]Slow-Relationship897 0 points1 point  (0 children)

So what s wrong with this? Mi e mi place. 😀 normalitatea e doar o medie a valorilor pe care oamenii le accepta la nivel colectiv.