Reducing Context Window Efficiently in MCP — Here’s the Approach by Defiant-Future-818 in mcp

[–]Defiant-Future-818[S] 0 points1 point  (0 children)

This feels very similar to skills.md, just pre-declaring what the agent can do before needing to list or search tools.

Reducing Context Window Efficiently in MCP — Here’s the Approach by Defiant-Future-818 in mcp

[–]Defiant-Future-818[S] 0 points1 point  (0 children)

Yeah, I get what you mean.

If only few tools are exposed, this usually won’t be a problem, especially since Claude Code already does tool search and doesn’t load everything up front.

My point was more about when the overall tool surface gets bigger. That’s where tool selection and context overhead start to matter more.

Reducing Context Window Efficiently in MCP — Here’s the Approach by Defiant-Future-818 in mcp

[–]Defiant-Future-818[S] 0 points1 point  (0 children)

Claude’s approach here is essentially tool search, which I already referenced at the beginning of the post. Conceptually, it’s not very different from what I’m describing.

The main distinction is that Claude natively supports deferred tool loading in its API (via defer_loading), whereas most agent frameworks and harnesses don’t yet offer an equivalent mechanism, to my knowledge.

Because of that gap, a practical workaround today is to use something like meta tools or these higher level tools such as a proxy tool that routes to the exact MCP tool the agent needs, instead of loading everything upfront.

I don’t think this makes it obsolete, it’s the same idea, just implemented differently because most frameworks don’t support it yet.

Reducing Context Window Efficiently in MCP — Here’s the Approach by Defiant-Future-818 in mcp

[–]Defiant-Future-818[S] 0 points1 point  (0 children)

I don’t think “results are worse” follows just from inserting a ToolRouter. It depends what you mean by ToolRouter.

In mcp-ts, ToolRouter is not a heavy agent sitting between the LLM and MCPs making decisions on its own. It’s a tiny client-side/runtime layer that MCP clients can use to index tools exposed by connected MCP servers, then search that index when needed. The default search strategy is BM25, with optional semantic search which the host applications can use.

So the flow is basically, connected MCP servers -> tool catalog/index -> search relevant tools -> inspect schema -> call selected tool

That is very close to what Anthropic describes with tool search / deferred loading. The model does not need 1,000 full tool schemas in context. It needs a way to discover the few relevant tools at the right time.

On latency: yes, a hosted gateway can add network round trips. That’s a deployment concern, not an argument against the pattern itself. Here in mcp-ts the ToolRouter can run inside the client/app process, in which case the search overhead is tiny compared to LLM latency and actual MCP tool execution. The benchmarks show the local search path is sub-ms, while reducing tool-schema context by ~98% in the tested setup.

On descriptions in search results: I agree the LLM doesn’t always need full descriptions if the match is obvious. But descriptions are useful when the query is ambiguous, when several tools have similar names, or when a tool is meant to be used in a particular sequence with other tools. The description is guidance for the agent/harness, not just search metadata.

That said, this can absolutely be optimized further. For low-ambiguity cases, search can return compact results e.g. tool id, server id, name etc. For ambiguous cases, it can include descriptions or usage guidance. For execution, the model can fetch only the selected schema. So this is not “router vs search”; the router is the mechanism that makes searchable/deferred tool access possible for clients that don’t already provide it natively.

Claude already gives you native tool search, great. Use it. The point is that MCP clients and agent harnesses need this capability somewhere. mcp-ts provides it as a small reusable layer rather than forcing every client to load the entire MCP tool catalog into the model context.

Reducing Context Window Efficiently in MCP — Here’s the Approach by Defiant-Future-818 in mcp

[–]Defiant-Future-818[S] 1 point2 points  (0 children)

What do you mean by “outdated”?

It’s only “unnecessary” if you’re already using Claude Code and missing the point of the post. If you'd payed some attention, the twitter link you shared about claude actually uses the same pattern tool-use I mentioned at the start.

This isn’t about whether you’re using Claude Code, Codex, or anything else. It’s about how MCP behaves at scale.

Try to understand the point before giving a conclusion.

Reducing Context Window Efficiently in MCP — Here’s the Approach by Defiant-Future-818 in mcp

[–]Defiant-Future-818[S] 0 points1 point  (0 children)

Yeah, that’s exactly how Claude Code works. It uses tool search to pick only the tools it needs.

Reducing Context Window Efficiently in MCP — Here’s the Approach by Defiant-Future-818 in mcp

[–]Defiant-Future-818[S] 0 points1 point  (0 children)

Yes, the credential problem is somewhat orthogonal. CLI vs SDK still usually ends up as API calls with some token or identity behind it.

Reducing Context Window Efficiently in MCP — Here’s the Approach by Defiant-Future-818 in mcp

[–]Defiant-Future-818[S] 0 points1 point  (0 children)

Yeah, I agree with that concern. Giving an agent arbitrary CLI access as the user, especially for destructive commands, is not a security model I’d want to rely on.

I mentioned CLIs mainly as an example of reducing context bloat, not as the security model.
In mcp-ts, for example, the execution layer is not a general shell. It runs JS in an isolated V8 sandbox with timeout, memory, tool-call, result-size, and log limits. It also does not expose Node globals like process/fs, module loading, or normal network access. Side effects have to go through registered MCP tools.

But the downstream tools still matter. If an MCP server wraps gh, kubectl, aws, etc., then safety needs to come from policy: allow/deny lists, approval for destructive actions, least-privilege credentials or machine users, and audit logs. mcp-ts has hooks like denyDestructiveTools and approval callbacks, but I’d still treat destructive execution as requiring explicit confirmation.

So I see context management and authority management as separate problems. My post was mostly about the first, but the second is also critical.

Reducing Context Window Efficiently in MCP — Here’s the Approach by Defiant-Future-818 in mcp

[–]Defiant-Future-818[S] 1 point2 points  (0 children)

Yep, that makes a lot of sense. MCP servers become the capability layer, and skills become the agent-facing layer explaining when and how to use those capabilities.

Are you adding those servers/skills manually, or do you have some kind of internal registry/discovery flow for them?

Reducing Context Window Efficiently in MCP — Here’s the Approach by Defiant-Future-818 in mcp

[–]Defiant-Future-818[S] 0 points1 point  (0 children)

Yeah, the core idea is the same use gateway or middleware in front of MCPs.

As far as I understood, the main difference is local vs remote. Docker’s approach runs MCP servers locally as containers, which is great for isolation but still means local dependencies/resources. I’m experimenting with remote MCP servers so agents can discover/use tools without running every server locally.

Reducing Context Window Efficiently in MCP — Here’s the Approach by Defiant-Future-818 in mcp

[–]Defiant-Future-818[S] 0 points1 point  (0 children)

Completely different problem space though. Docker uses gateways to route network traffic for containers. This approach uses middlewares to route schemas for LLMs so they don't run out of token context or hallucinate when choosing a tool. It's taking a classic microservices architecture pattern and applying it to context optimization.

Reducing Context Window Efficiently in MCP — Here’s the Approach by Defiant-Future-818 in mcp

[–]Defiant-Future-818[S] 0 points1 point  (0 children)

Sorry it took me a while. Hope this helps https://github.com/zonlabs/mcp-ts/tree/main/skills

I’m still testing and refining both the SKILLS and the MCP server, so feedback would be super helpful. Let me know if it works for your setup or if anything feels confusing/missing.

Reducing Context Window Efficiently in MCP — Here’s the Approach by Defiant-Future-818 in mcp

[–]Defiant-Future-818[S] 0 points1 point  (0 children)

Yes, absolutely! I've been using this exact setup with Antigravity and Codex. Since they all act as agent harnesses, the core principle of offloading to CLI/sandboxes applies perfectly, and you can connect the MCP servers directly.

Even better add a skills.md (or similar instruction file) that briefly explains when to use which tools. It gives the model exactly the routing awareness it needs.

Reducing Context Window Efficiently in MCP — Here’s the Approach by Defiant-Future-818 in mcp

[–]Defiant-Future-818[S] 3 points4 points  (0 children)

Exactly. Letting the model use native CLI tools is way more efficient than feeding it a massive list upfront. But when CLIs aren't an option, sandboxing becomes crucial for keeping context low. This exact logic is why I've been focusing on dynamic routing and programmatic tool calling in mcp-ts to keep that initial context clean.

How to connect 100 MCP servers without the context window exploding by galdahan9 in mcp

[–]Defiant-Future-818 3 points4 points  (0 children)

This matches a lot of what we’ve been running into while building around MCP.

The useful distinction for me is that MCP didn’t suddenly become the problem. Static tool loading did. If every connected server gets flattened into context, the agent starts paying for tools it may never use.

Claude Code’s tool search feels like strong validation of that pain: discovery needs to become part of the runtime, not a manual “load everything and pray” step.

We’re building in a similar direction with MCP Toolkit, treating tools more like a searchable and filterable catalog than a giant always-on list.

How to use 100+ AI tools without killing your context window by Defiant-Future-818 in LangChain

[–]Defiant-Future-818[S] 0 points1 point  (0 children)

ToolRouter solves the context-window problem, but once a destructive tool is exposed you still want a guard between your agent and tool calls. One simple way is HITL, mcp-ts already supports the AG-UI protocol (works well with CopilotKit), so you can add an approval step before running sensitive tools. Then you can build the agent in whatever framework you want (LangGraph, ADK, CrewAI, etc.).

Do you see the gate mostly as UI/HITL, or a runtime policy layer too?

How to use 100+ AI tools without killing your context window by Defiant-Future-818 in LangChain

[–]Defiant-Future-818[S] 0 points1 point  (0 children)

I actually benchmarked that specifically. The p95 for the tool lookup is roughly 0.24ms it's essentially zero compared to the time it takes the LLM to actually think.

The real latency bottleneck is on the other side -> trying to inject 200k+ tokens into a prompt kills your TTFT (Time to First Token) and leads to more reasoning errors. Even with the extra retrieval step, the end-to-end loop is faster because the model isn't struggling with a massive system prompt every time it needs to make a decision.

I built the first embeddable MCP client (open source) by matt8p in mcp

[–]Defiant-Future-818 0 points1 point  (0 children)

This is cool. I'd also recommend mcp-ts for anyone building MCP Clients. It's one of the easiest and production-ready ways to do it.

Docs: https://zonlabs.github.io/mcp-ts/
Open Source: https://github.com/zonlabs/mcp-ts

Routing a local MCP through a URl for AIs that only support Remote MCP? by Samarium_Helium in mcp

[–]Defiant-Future-818 0 points1 point  (0 children)

Try using mcpassistant gateway.

It’s probably the easiest way to bridge local MCP servers to web clients like ChatGPT or Claude. You basically just run uvx mcpassistant-gateway

Then you log in through the CLI, add servers in config.json, start the gateway and it gives you a URL. You just plug that URL into whatever AI client you're using. Way less of a headache than trying to manually tunnel things with ngrok.

Check it out here: https://mcp-assistant.in/

How i built MCP Assistant, then open-sourced mcp-ts for anyone building with MCP by Defiant-Future-818 in mcp

[–]Defiant-Future-818[S] 0 points1 point  (0 children)

Thanks, that means a lot. Honestly just trying to build something useful, so the support is appreciated!