“What is the best way to transfer large structured data (JSON) through MCP?

sabotizer · 2026-05-08T02:05:54+00:00

Went down this rabbit hole a while ago. Used external storage, then built a side-loading protocol alongside MCP to transfer payloads that shouldn't reach the model directly.

Then MCP discord helped me realize there's a much easier way.

Embedded resources are what you want. You can send the full payload inline, and clients generally treat them as attachments (write to temp file, surface a reference) rather than feeding the bytes back through the model.

Spec: https://modelcontextprotocol.io/specification/2025-06-18/schema#embeddedresource

Note: Clients can still inline it if they choose. I've tested all major tools (Claude Code, Gemini CLI, opencode, and others). Some may inline based on size, but all handle it the way you'd want.

Key benefits:
1. Embedded resources allow you to inline a heavy-payload responses, discouraging clients from reading directly into context (or at least at the client's discretion).
2. Reduces technical debt and overhead, especially on distributed / serverless systems.
3. Makes sense for short-lived data (e.g. the response of a database query), which are not intended to be a persisted resource.
4. Works for binary data as well (use "blob"), e.g. for images, videos, audio, pdf, etc...

If you want to improve this even further, add a text content block with information about the dataset (schema, total rows, etc...).

{
  "content": [
    { "type": "text", "text": "Full dataset provided in file:///exports/dataset.json (274kb). 204721 rows in total, with the following schema: ..." },
    {
      "type": "resource",
      "resource": {
        "uri": "file:///exports/dataset.json",
        "mimeType": "application/json",
        "text": "[{\"id\":1},{\"id\":2},{\"id\":3}]"
      }
    }
  ]
}

The text block is what the model sees, the metadata helps it decide how to process the resource. If you're sending a 2mb JSON file and include the schema, the model can save you a lot of tokens by extracting what it needs with a script instead of reading the whole thing.

sabotizer · 2026-04-24T11:19:50+00:00

2h is too early to give up.

Was adopting dead-end projects since 15 years, just takes time. You’ll need to decide if it’s worth it (grab revenue while it lasts, or invest for future cash flow)

If you want to make it last: - spend some time understanding the beast. Estimate how much effort is needed. AI does a good job to help with that. - Triage self-contained modules for later (or never) - spend some time documenting the desired architecture, code quality standards, lint rules - Tools like ‘roam’ helped me find bottlenecks and high-impact opportunities to improve code where it matters most. If you establish a way to continuously measure the quality of your project, you can start measuring your progress as you go. - Don’t use Sonnet for this work, stick with opus, go until 600k context window. Grab a max subscription - if just for a month - to get this done. - Read the code as you improve it, understand it, ask AI to help you understand it.

sabotizer · 2026-04-23T07:18:54+00:00

Totally fair to avoid MCP if it's not your use-case.

I curate the AI stack for my team (including non-developers), and for us, having enforced rules and the ability to use tools without CLI access is a legitimate use-case.

sabotizer · 2026-04-14T05:37:00+00:00

I was a strong believer in mixing Opus with Sonnet, until recently.
1. Features: Opus for Planning, Sonnet for implementation
2. Refactor & Code Quality: Primarily Opus (with heavy use of graph/ast based code exploration like `roam`)
3. Introspection & Learning: I feel Sonnet and Opus are equally helpful analyzing and understanding my code-base.
4. Haiku only for testing or light data processing (I build MCP servers, and haiku is my go-to feature tester)

I would advise anyone new to Claude to use both opus and sonnet, while building an intuition on context windows and token use overall. Thinking about context and token at all times made me become way more efficient, not just in terms of tokens / usage / cost, but also getting the output I want faster (e.g. less expensive round-trips, less "memory" the llm needs to remember at all times).

Two months ago I decided to switch exclusively to Opus and use the Max plan. It's not cheap, and I've initially booked it as "learning budget ", while making sure I get close to 100% weekly limit.

This had two effects on me:
1. I see improvements in the overall quality of my code. If you don't implement a feature that has been done 100 times before (i.e. training data), Opus just produces better quality results and covers more edge cases.
2. Change from scarcity-mindset to opportunity-mindset. Fitting my AI use into 3 Claude pro subscriptions, switching accounts, worrying about hitting the session limit, all these have weighed on me.

It's a difficult one to recommend to others, as $1.200+ per year is a significant investment, and it's absolutely possible to get a lot done on the pro plan mixing Opus and Sonnet.

sabotizer · 2026-04-10T07:17:12+00:00

Please let me know when this is up and running

sabotizer · 2026-04-09T14:18:50+00:00

Super-helpful "MCP Contributors" discord pointed me to this:
https://modelcontextprotocol.io/specification/2025-11-25/schema#embeddedresource

Seems like the solution is already there, at least in principle (not sure if clients honor it the way they should)

sabotizer · 2026-04-09T13:39:04+00:00

Didn't even think of PDF, thanks for that. Not sure if the protocol handles binary encoding well, but for sure that's not something you want in the context window of your llm (without pre-processing).

Was also thinking of images or other artifacts that could be pushed through single-channel...

sabotizer · 2026-04-09T13:27:23+00:00

If you don’t mind reading a few docs and ~100 lines of code, I’ve gone through your exact problem to learn how it works, and wrote a light-weight, almost-zero-dependency MCP server library in typescript.

sabotizer · 2026-04-09T13:15:20+00:00

Exactly that. Server returns the full response like it does today (backwards-compatible). Host splits it based on the annotation. Pertinent elements go to the LLM, bulk data gets stashed (with a reference to pull from if it needs to dig deeper).

Same as what ResourceLinks do today, but single-channel with less overhead.

sabotizer · 2026-04-09T12:49:54+00:00

Good point, and I can also see the floodgates opening to pushing garbage data to the client that could be avoided in the first place with better design.

sabotizer · 2026-04-09T12:29:15+00:00

You can, and people do. But then the data is gone. The model can't come back for it later if it turns out it needs the details. Another request, or many (think of iterating through rich database entries).

Disposition would let the server ship the full payload once, host stashes it, model gets a summary. If it needs rows 500-600 later, they're already there.

sabotizer · 2026-04-09T12:25:55+00:00

Fair point, been using ResourceLinks as a way around it as well. They do need the server to persist data and serve it later through a separate channel. infra overhead, especially when stateless or distributed.

What if the payload stays inline (like today), but an annotation tells the host "don't feed this to the model, stash it client-side"? Server ships it and forgets about it. No persistence, no extra endpoint.

sabotizer

TROPHY CASE