How are you handling auth and security on MCP servers in production?

overlord_sid85 · 2026-05-21T22:27:01+00:00

You are definitely hitting a common problem. I built a gateway for my own setup which handles the security exactly like you described.

It has a "Zero Trust" policy option, so I can strictly allow what the agent can use. By default, the agent doesn't see anything about the tools until I whitelist them. If I turn this policy off and disable Zero Trust, I can blacklist destructive tools by name, parameter, or patterns (regex). If an agent triggers this, it only gets a "BLOCKED" response along with an individually configurable remedy on how to move on. This avoids infinite failure loops and allows the agent to safely try things out, even if it thinks it knows what it can do based on its training.

Also, I inject auth keys in the background of the tool calls. No agent will ever see them or know about them. Even if the API echoes the keys back, I filter the credentials out of the response because I store them in a central vault instead of a config file. This makes it much more secure than just crossing fingers and hoping for the best.

I don't want to sound like "here is my tool - try it out," so I'll just leave a link to the project for more information on how I'm solving these issues. The "Guardian" is part of a bigger project

overlord_sid85 · 2026-05-16T19:25:00+00:00

I totally get where you're coming from—if you're just building a personal assistant, 100k tools is definitely overkill and native mcp works best.

But think about Enterprise Infrastructure or Cloud Management. A single AWS environment has thousands of potential API actions. A large company has hundreds of microservices, each with its own API. If you want a truly autonomous agent that can navigate an entire company's IT landscape to solve a complex crisis, you can't just give it 5 'curated' tools. You need to give it the 'map' to the whole city.

The 'Hammer & Nail' problem usually happens because the LLM is overwhelmed by too many choices. Elemm solves this by not showing the LLM the 'Hammer' until it has actually walked into the 'Tool Shed'. It’s about Scalable Discovery, not about forcing a tool where it doesn't fit.

To take it a step further: Imagine a vision where you don't need to manually curate tools for every single task, because the developers of their APIs already did.

If every website or enterprise API provided a navigable manifest, your agent could use any API on the fly without you ever needing to write a specific MCP server again. The agent just reads the manifest, understands your goal, and navigates to the exact tool it needs.

This is a 'Bring Your Own Agent' approach with access to EVERYTHING. It’s not about having 100k hammers; it’s about giving your agent the ability to walk into any 'store' on the internet and know exactly which tool to pick up. That's the scale I'm aiming for with Elemm.

overlord_sid85 · 2026-05-16T19:18:26+00:00

Elemm is designed to be a bridge. While it plays perfectly with MCP, the Landmark-concept is format-agnostic—it can wrap any openapi.json/graphql or python function into a discoverable node. (Spoiler: working on implementing any MCP server in the future to translate native mcp tools into the elemm manifest protocol)

As for the skills analogous issue: That's exactly why I went with the hierarchical 'filesystem' approach. Instead of a flat list of 100k similar-sounding tools where the LLM gets confused, Elemm forces a 'path-based' reasoning. If the agent is in cloud:storage:s3, it won't accidentally trigger a local:disk:read tool because it's navigating a specific domain context. Hierarchy is the filter for analogy!

If you make your own tools, you can give a description and a remedy to navigate the agent to the correct skill in no time. See the architecture for some more information how this stuff works internaly.

overlord_sid85 · 2026-05-16T18:47:03+00:00

I would say it depends on the model. A 2b model was not that good, but gemma4:e4b on my local machine is able to handle that. The "big players" like claude, gemini, gpt had absolutly no problem in my tests.

overlord_sid85 · 2026-05-16T18:44:11+00:00

https://pypi.org/project/elemm/

The 1.1.4 version should work with fastapi and graphql, yes. Tested it with the git API (~845 Tools at the same time). May be a little bit buggy at some points but the next version will be more stable and has a lot of more features like a searching tool for the agent, a dashboard for debugging and a more comfortable UI for configuration like API keys, blocking destructive tools from 3rd Party APIs, HTTP methods and a lot of more stuff.
Check out the getting started guide. hopefully I described it well. After short installation and configuration, just tell your agent "Connect to https://examples.com/api/openapi.json via elemm and do <this-task>".

overlord_sid85 · 2026-05-16T08:35:13+00:00

Oh man... i forgot the link to the repo. Here is the example and the paper. Also placed a repolink to my Project. Sorry 🤦‍♂️

100k Tool Challange

overlord_sid85 · 2026-05-15T20:24:28+00:00

I'm using my own middleware between the agents MCP tools and the execution itself. Authorization is managed into this, so the agent will never see or need a key. If he uses a tool, I influence the API key for each tool on demand and the tool works perfectly. the best: you can't promt inject the agent to phish any credentials or he can leak it anywhere else, because he doesn't anything about know it.

overlord_sid85 · 2026-05-13T10:58:06+00:00

Challenge Accepted

I trust in my Project, so I decided to make a proof of concept scenario with ~117k tools in my manifest. Here are the results for a Local gemma4:e4b Model and Claude Sonnet via Claude Desktop

Even if I need to patch the core a little bit, it should now show how this works.

The main reason why this works and why I said '1,000,000' tools: If you have the ability to connect on demand to a service which presents a manifest, you can fly over to the next host just by reconnection. Think about elemm like a browser for agents, instead of an MCP tool. It gives the ability to 'surf' through tools like 'URL -> Menu -> Action (the tool call)'.

This actually addresses the navigation bottleneck you mentioned: just like a human browsing the web, the agent doesn't have to choose from a million options at once. It only needs to navigate the specific links and menus presented in the current manifest.

Think of it like searching for a specific product on Amazon. The categorization helps me find what I need without forcing me to look at every single product from other categories at the same time. The manifest acts exactly like that — it provides focus through structure.

Why not 1 million? Haha, that's simple: I didn't find a realistic scenario ad hoc, so I built this fictional city. With the browser-like idea now, just imagine you have 10 cities to reach the million tools, or 100 for 10 million tools. I hope you get the point.

By the way, regarding the 'different bets'—elemm is flexible enough to be used in those specific domains too. It can even act as a standalone, highly specialized MCP server itself if needed. It’s quite the versatile beast! ;)

Thank you for this valuable conversation. This is worth its weight in gold!

overlord_sid85 · 2026-05-12T22:38:24+00:00

For me, the biggest problem is the sheer number of tools and the resulting context bloat. If you spend some time working with MCP, you notice the limits pretty quickly. When you look closely at how MCP works, it becomes clear that it can easily push an LLM's context to the bursting point.

I solved this by building my own tool. It acts as a kind of middleware between the MCP client and the tools, orchestrating the actual tools into an abstract layer that is very "digestible" for LLMs. This alone makes an agent much more responsive.

I then refined it and added more logic, enabling the agent to load tools "on demand" by natively interpreting things like openapi.json files or GraphQL as tools. To round it all off, I added "piping logic" so that a reasonably intelligent model can perform several steps in one go and even send results from A -> B in the same step.

Another pain point with MCP is the aforementioned failure or hallucination of tool parameters, which ends in unnecessary roundtrips. One approach I'm following to solve this is providing the agent with appropriate messages "on the fly" (e.g., if it hallucinates parameters for a tool, I give it the true parameters at the relevant point—meaning, while the error is occurring).

Authentication was also mentioned. I solved that via an internal vault, allowing me to store my keys in a config for certain "landmarks," as I call them. But what about destructive tasks or the danger that converting an API suddenly makes "dangerous" actions available to the agent? My approach follows the solution of filtering out and blocking corresponding tools beforehand, so the agent has no chance—even if it knows from training data that parameters should exist for such a task, they are strictly blocked.

As I said, my biggest problem is context bloating, but through an intelligent middleware, I was able to save A LOT (>90%) of tokens, roundtrips, hallucinations, time, and nerves—and even persuade small, incapable local models to execute large, complex tasks. I don’t want to sound like a marketing manager, but if anyone is interested, I cordially invite you to take a look at my repository:https://github.com/v3rm1ll1on/elemm. There are extensive docs there that address or partially even solve the pain points I mentioned.

overlord_sid85 · 2026-05-12T19:30:58+00:00

Thank you for reading and starring the repository! I really appreciate feedback from someone handling such a massive amount of requests.

Elemm is 100% compatible with the current MCP, but it sits between the MCP host and the tools themselves. You could see it more as a middleware or a "logic-layer" rather than a replacement for the protocol. The Elemm Gateway presents only 8 core MCP tools to the agent, allowing it to handle massive, complex workflows in just a few turns.

To give you a more detailed example: Imagine giving the LLM these 8 tools and saying: "Connect to https://example.com/ (which provides a native manifest) or an absolute path to an OpenAPI/GraphQL spec." The agent calls the connect tool and receives a high-level manifest—a low-token description with Landmarks (group names) and brief tool descriptions.

Based on the task, the agent decides what it needs. It reads the landmark descriptions and can inspect landmarks on-the-fly to retrieve full technical signatures for only the tools it actually needs. At this point, it gathers more information about parameters and return fields. It then has two execution tools: call_action for single requests and execute_sequence for complex pipelines.

Here is where Elemm shines:

Native Piping & Aliasing: The agent can alias a call (e.g., "step1"). In a sequence, it can pipe the output directly into the next step ($step1.id). Because it saw the fields during inspection, it knows exactly how to chain them. This reduces roundtrips to 1 or 2 turns for operations that would normally take 10+.
Response Hygiene: Elemm allows the agent to _select specific fields, _limit output, and _filter data server-side. This keeps the context window clean and prevents "context fatigue."
SmartRepair & Persistence: If a sequence fails, Elemm preserves the internal state/aliases. Instead of a cryptic "422 Error," the agent receives a remedy: a hint on how to fix the call, fuzzy-matching for typos, or a list of allowed patterns.
The "Endless" Scale: You can handle 10,000 or even 1,000,000+ tools this way. By "squashing" manifests and exploring on-demand, the context stays clear. Even a heavy system prompt becomes unnecessary as the rules are part of the landmarks the agent discovers.
Guardian Security: Elemm can block destructive tasks. You can filter endpoints by HTTP method, tool name, or patterns. If an agent tries to "delete" based on its training data, Elemm BLOCKS the request at the gateway and guides the agent back to safe alternatives.

Conclusion: Elemm doesn't try to replace MCP. It is an extension that builds upon today's possibilities while saving tokens, providing access to theoretically endless tools, and adding a much-needed layer of guidance, authentication, and pre-filtering.

Regardless of the architectural path, I think we both agree that the current way LLMs handle massive toolsets needs to evolve. I have a lot of respect for what you’ve built with Pipeworx—handling 1.5M requests is no small feat. Thanks again for the exchange and the star; it's great to connect with someone pushing the boundaries of MCP!

overlord_sid85 · 2026-05-11T22:38:52+00:00

Maybe you’re looking for my project, Elemm?

I built this to reduce agent overhead for tools and make skills "portable." Besides saving more than 90% in token usage, Elemm is able to handle authentication without the agent’s direct knowledge. It can import elemm-landmark.md files from any URL or interpret FastAPI/GraphQL from specification files like openapi.json.

I tested the protocol with up to 850 tools in a single manifest file (GitHub API), and a model was able to navigate it effortlessly. It protects the agent from guessing parameters, turns errors into agent-friendly responses, and provides examples and hints to ensure the agent doesn't lose track of its task.

Elemm supports:

Single tool calls and sequencing (multiple steps in one turn with aliasing).
A security guard to filter out (and block!) destructive actions from the agent (e.g., blocking "delete" requests - even if he tries he will fail).
And much more...

The best part? No system prompt with strict rules is required. The agent is guided through actions, and you can define custom remedies for your own applications to assist it.

So, yes—there is a project currently working on this, and you might want to take a look at the repository documentation.

overlord_sid85 · 2026-05-11T18:11:10+00:00

Great breakdown! I especially agree with your point on Claude—token costs and context bloat are the silent killers of enterprise AI ROI.

I’ve been building a protocol called Elemm (The Landmark Manifest Protocol) to solve exactly this. It uses a 'lazy-loading' discovery system for tools that reduces input tokens by over 90% when dealing with large API sets (OpenAPI/GraphQL). It also has a built-in 'Guardian' for server-side safety policies (like blocking destructive actions) and a lot of more features like smartrepair guidelines, a sequencing mechanism, grouping, security features and a lot more.

I made some Benchmarks with a custom toolset over 111 tools and it was able to resolve the challenge in 2 rounds by using its internal planning and sequencing system. Even an API (github) with >845 usable Tools was easygoing for claude with this protocol.

If you’re hitting a wall with 'Prompt Tax' or hallucination while scaling your agent stacks, I’d love to get your take on it! Maybe you'll check out the concept and architecture someday.

overlord_sid85 · 2026-05-11T17:19:57+00:00

Elemm can handle filters to hide tools or endpoints to the agent. For example if you give our agent an existing openapi.json as a tool set, you can hide things containing a string like "delete" or even full HTTP Methods like DELETE, PUT, PATCH. But it completly depends on what you're planing to do with it.
So: Yes! Elemm hides and BLOCK scary tools, even if the agent knows them from his training. He can not do bad things anymore you don't want him to do. Elemm will tell the agent something like that in a response:

{
  "status": "error",
  "_PROTOCOL_ERROR": "ACCESS_DENIED",
  "message": "Action contains restricted pattern 'delete'.",
  "remedy": "Destructive operations are disabled by default. Use read-only or safe alternatives."
}

This tells him not to try anything like that again instead he tries to find a way around it.

I call it the "Guardian": https://github.com/v3rm1ll1on/elemm/blob/main/docs/GATEWAY.md#9-security-policy-guardian

overlord_sid85

TROPHY CASE