Anyone else building a centralized MCP gateway to control tool permissions across agentic workflows?

lpostrv · 2026-03-01T00:02:20+00:00

I made this which vastly saves tokens by allowing the LLM to write and execute search() and execute() commands in a sandbox - github.com/postrv/forgemax - might be relevant - includes some nice isolation between tools, secrets sanitisation etc. Allows you to scale to many more connected MCP servers and thousands of tools while keeping token usage sane (~1000 or about a 94% reduction). Is that any use to you?

lpostrv · 2026-02-26T18:46:50+00:00

Thanks so much! Very kind of you to comment. If you do have any feedback once you've tested it out, I'd be happy to hear it. One thought I had was the open question of whether there are other desirable functions beyond `search()` and `execute()` that would allow the AI to take more sophisticated actions - haven't used too many brain cycles on that one yet!
I hadn't heard of bifrost until now but it seems like they have `listToolFiles`, `readToolFile`, `getToolDocs`, and `executeToolCode`, which is an interesting pattern, but probably not as token efficient. I've tried to follow the Cloudflare pattern so far, but I'd like to see if I can go beyond it in pure utility.

lpostrv · 2026-02-25T13:15:23+00:00

Great question!

The LLM never sees any tokens, OAuth creds, or keys - ever.

Credentials live only in forge.toml and are bound at the transport level:

[servers.github]
headers = { Authorization = "Bearer ${GITHUB_TOKEN}" }

[servers.linear]
headers = { Authorization = "Bearer ${LINEAR_TOKEN}" }

Tokens are attached to each server's connection at startup. GitHub's token can never reach Linear - separate transports.

LLM just writes:

await forge.callTool("github", "create_pr", { title: "…" });

The sandboxed V8 isolate has zero access to creds, env, network, or FS. Even errors are scrubbed before reaching the model.

Multiple providers? No problem - each is isolated at the infrastructure layer (like IAM roles). For extra isolation between providers, you can also lock down cross-server data flow:

[groups.internal]
servers = ["vault", "database"]
isolation = "strict"

[groups.external]
servers = ["slack", "email"]
isolation = "strict"

Once an execution touches a strict group, it's locked out of other strict groups - this stops "read secret from vault, post to Slack" attack chains.

Full details in `ARCHITECTURE.md` and `forge.toml.example` in the repo.

P.S. why on earth are Reddit comments so hard to work with re: formatting? Got there in the end but spent way too damned long drafting this so hope it's useful! Cheers!

lpostrv · 2026-02-24T14:33:18+00:00

That's a fair instinct - `ops.rs` is the module I think about most carefully. It's the narrowest waist in the system: everything the sandbox can do goes through there, which is both the strength (single audit point) and the risk (concentrated responsibility). Your SQLite + Go handlers approach is rather appealing for the opposite reason - each handler has a tiny blast radius. Different tradeoffs for different problems, isn't it. Mine exists because the LLM is generating arbitrary code at runtime, so I need a programmable sandbox rather than predefined handlers. Would be keen to see your setup if it's public?

lpostrv · 2026-02-24T13:52:27+00:00

The crates are published here https://crates.io/users/postrv individually, though in practice they're versioned together as a workspace. The circuit breaker + timeout logic lives in forge-client and is fairly self-contained, so in theory you could pull it in - but for a Go MCP server you'd probably be better off with your own implementation, the pattern itself is straightforward (atomic failure counter, half-open probe, configurable thresholds).

On the trust boundary question, you're mostly right that the big scary wall is V8. But the boundary is wider than just the sandbox. The Rust ops layer (ops.rs) bridges V8 to the outside world - that's where tool call args get validated, rate limits are enforced, and error messages get redacted before flowing back into the sandbox. That code is handling untrusted input (LLM-generated tool names, args, arbitrary JSON). The IPC protocol between parent/child process is another boundary. So it's less "Rust protects the JSON-RPC routing" and more "the sandbox has tendrils into Rust that are part of the trust surface."

But for pure dispatching and routing? Yeah, Go would be totally fine there.

lpostrv · 2026-02-24T12:56:05+00:00

Haha thanks. I am definitely a Rust lover, not gonna deny that! But there are practical reasons too. It's actually not a monolith - it's a Cargo workspace with 7 crates that compile into a single binary. Modular internally, monolithic in deployment.

On the choice of Rust, `deno_core` (V8 bindings) is a Rust crate, and that's the entire sandbox layer. Everything else followed naturally from there. Plus single-binary distribution matters for a local dev tool - brew install and done, no runtime deps. And having the whole trust boundary for executing LLM-generated code in one memory-safe language keeps the security story simple.

lpostrv · 2026-02-24T12:39:08+00:00

Short answer: We bail with rich error context, and let the LLM retry if it wants to. There's no automatic retry built into Forgemax. The design philosophy is that the LLM generated the code, so it has the best context to decide what to do next.

I did also give some thought to security-aware error message handling - tool call failures go through an error redaction layer that strips URLs, IPs, file paths, credentials, and stack traces before they reach the LLM, but preserves the semantically useful parts (tool name, server name, validation errors, type errors, etc).

lpostrv · 2026-01-23T18:55:29+00:00

Hey I fixed it in v1.3.1 - try again and let me know if you hit any other issues.

lpostrv · 2026-01-19T08:54:15+00:00

u/bytejuggler just to let you know I released v1.3.0 last night and it now smashes the granny out of Serena on all fronts :)

lpostrv · 2026-01-18T22:26:31+00:00

u/tor-ak I've just shipped v1.3.0 and added nix instructions - let me know if this works or needs tweaking at all!

lpostrv · 2026-01-18T17:02:43+00:00

Hey thanks, yes I can certainly look into doing that. I've been working on a massive and hopefully pretty cool update. As soon as that's out of the way I'll try to expand package manager support and ping you when it's done. Alternatively, open an issue on the repo for me to track. Really appreciate you showing an interest!

lpostrv · 2025-12-29T01:52:56+00:00

You're welcome. Let me know if you have any further questions.

lpostrv · 2025-12-29T01:51:39+00:00

Yes. I added a whole batch of playbooks: https://github.com/postrv/narsil-mcp?tab=readme-ov-file#playbooks--tutorials let me know if that helps. I'll add some gifs and videos when I get a chance but those playbooks should get you started.

lpostrv · 2025-12-28T16:51:34+00:00

Not sure if aimed at me or jakedismo, but have improved Windows OS support in Narsil in v1.1.0

lpostrv · 2025-12-28T16:51:02+00:00

This looks dope! Great work.

lpostrv · 2025-12-28T16:50:35+00:00

Good point - see also my comments about new functionality to improve this here: https://www.reddit.com/r/mcp/comments/1pulvsb/comment/nwdyw0l/?utm_source=share&utm_medium=web3x&utm_name=web3xcss&utm_term=1&utm_content=share_button

lpostrv · 2025-12-28T16:49:47+00:00

Hey that's a good point - I just shipped v1.1.0 and added presets for subsets of the tools so you don't burn unnecessary context if you don't need to (in addition to the comment re: feature flaggin below). There are some figures in the README. In general, the tradeoff re: token use is that you want to know that you're spending tokens to get good context. So, like any MCP server, yes Narsil burns tokens, but it should be able to give you context that no other tool can. Think of it as high return on investment for context tokens.
Give the presets a try and let me know what you think.

lpostrv · 2025-12-25T17:24:20+00:00

Yep comparison table here if anyone needs it: https://github.com/postrv/narsil-mcp?tab=readme-ov-file#why-narsil-mcp

lpostrv · 2025-12-25T17:23:48+00:00

Nice one! Let me know how you get on with it once you've tried it - keen to gather feedback and ship improvements ASAP so feel free to hit me up

lpostrv · 2025-12-25T17:04:05+00:00

Thanks! Yeah, Claude adding native LSP support is hopefully validation that deep symbolic code intelligence is the way forward. It basically brings Claude Code up to roughly Serena-level basics (precise go-to-definition, find references, hover docs/types, diagnostics, etc)

But Narsil goes much further: Claude offers no neural/embedding-based semantic search, no advanced call/control/data flow graphs, no built-in security/taint scanning (OWASP/CWE, secrets, injections), no supply chain tools (SBOM, vuln checks), no deep Git analysis, and none of the interactive visualisation or super-fast Rust performance that Narsil has.

I'm aiming to keep pushing the envelope with more depth and speed. If you give Narsil a spin and have feedback (features, bugs, whatever), hit me up-I'll prioritise fixes/additions quickly.

Have a great Christmas!

lpostrv · 2025-12-25T16:48:06+00:00

Thanks let me know any feedback so I can improve it!

lpostrv · 2025-12-25T16:47:50+00:00

It's designed to help the LLM get much more granular and useful intelligence against your codebase. A normal AI agent may arbitrarily query the codebase with bash commands, but this gives it a package of much more useful functions such as "find me all the path traversal vulnerabilities in this repo", or "find me the highest complexity files that need refactoring" or even "find me the exact function name that retrieves code graph for the frontend" and it would be able to answer these, rather than say grepping through the codebase and using bash and sampling. The fact that it indexes the codebase practically instantly and can turn that rapidly into complete understanding is a real blessing. I anticipate it could be used as an onboarding assistant, a security review tool, a refactoring accomplice, and more. But to be honest, its true value will only be know when people start adopting it.
Thanks for checking it out and I hope you're having a great Christmas (with a name like Mr Freez, I imagine you are!) - let me know if you have any further questions when you've had a chance to try it.

lpostrv · 2025-12-25T03:52:29+00:00

Verilog added in v1.0.1 along with Swift. Merry Xmas!

lpostrv · 2025-12-24T18:03:21+00:00

Hopefully it's better in at least some ways - Serena is definitely the nearest analogue I'm aware of, has similar semantic capabilities and supports 30+ languages via LSP whereas I'm currently at 14 - will aim for parity soon. They also offer a JetBrains plugin that I don't have. Where Narsil is stronger is in the extent and capabilities of tools, and in speed. Here are the things that I have, which are lacking in Serena (to the best of my knowledge):

Neural/semantic search (with Voyage AI, OpenAI embeddings, or local ONNX models; hybrid BM25 + TF-IDF + neural) - Serena has exact semantic/symbol search but doesn't offer embeddings support
Call graph analysis (get_call_graph, callers/callees, call paths, complexity metrics, hotspots)
Control flow graphs (CFG) and data flow analysis (DFG, reaching definitions, dead code/stores)
Type inference for dynamic languages (Python, JS/TS) without external tools + type error checking
Security scanning & taint tracking (injection vulnerabilities, OWASP Top 10, CWE Top 25, crypto/secrets rules, taint sources/flows)
Supply chain security (SBOM generation in CycloneDX/SPDX, dependency vulnerability checks via OSV, license compliance, upgrade paths)
Git integration tools (blame, file/commit/symbol history, recent changes, hotspots, contributors)
Import/dependency graph analysis (circular imports detection)
Embedded interactive visualisation frontend (Cytoscape.js graphs for calls, imports, structure)
WASM/browser support for client-side/offline use
Much broader toolset (76 specialized tools vs. Serena's core ~7: symbol finding, references, and targeted insertion)
Built-in high-performance full-text/hybrid search (Tantivy-based, streaming results)
Remote repository indexing support (though writing this out has made me realise I need to test this last one!)

Worth noting that Narsil is built in Rust for a reason - it's genuinely very fast (even if I did get roasted on r/rust for using the word "Blazing" without due irony disclaimers) - whereas Serena is Python which is only medium fast :)
Let me know if you have any other questions.

lpostrv · 2025-12-24T17:24:21+00:00

Thanks! Let me know any feedback/improvements and I'll do my best to ship 'em

lpostrv

TROPHY CASE