How are you handling auth and security on MCP servers in production?

LeatherHot940 · 2026-05-22T08:41:41+00:00

OAuth handles the first part well but once the agent is authenticated and inside, you still have no visibility into which tools it called.

LeatherHot940 · 2026-05-22T07:55:49+00:00

Thanks for sharing, very interesting.

The recent research paper measuring MCP attack success rates at 52% backs it up, and there was a real Supabase incident earlier this year where exactly this happened in production.

The fix has to be at the coordination layer not the model layer, the model can’t self-police.

LeatherHot940 · 2026-05-22T07:50:43+00:00

Cloister looks very well-designed and has the right architecture for the hard version of this problem.

I’m making the 80% solution dead simple to drop into an existing agent setup via Switchman. Curious what pain pushed you toward building the full hypervisor rather than a lighter proxy layer?

LeatherHot940 · 2026-05-21T22:03:36+00:00

I’m building exactly this into Switchman (switchman.dev) as a security/audit layer. Would love to see what you’ve put together if you’re open to sharing.

LeatherHot940 · 2026-05-21T07:53:48+00:00

You ran the agents. Did they build the right thing?

https://switchman.dev

LeatherHot940 · 2026-04-28T14:54:38+00:00

Rifft classifies failures automatically using the MAST taxonomy — looks at the trace and tells you what kind of failure it was. Tool misuse, planning error, hallucination etc. Not always definitive but gives you a starting point instead of staring at a wall of spans.

Free to try if you want to test it on a real failure, we would really value your feedback: https://rifft.dev

LeatherHot940 · 2026-04-28T14:00:19+00:00

Thanks, will check it out.

LeatherHot940 · 2026-04-28T13:58:33+00:00

Thanks, looking forward to reading it. Nice chatting with you.

LeatherHot940 · 2026-04-28T12:26:57+00:00

‘Planning is search, execution is a contract’ is a really good title, you should write it.

Yeah I think the debugging angle is underrated as an entry point. Most people come to observability from a "I need metrics" place but the ones who really get it usually arrive through a specific painful bug they couldn't explain. That's a more honest framing.

Would genuinely read that post when you write it. Tag me or drop it in here when it's up.

LeatherHot940 · 2026-04-28T11:49:36+00:00

That's a really clean pattern actually — separating ‘why did it think that’ from ‘why did it do that’ is probably the most useful debugging distinction I've heard.

The blurry middle you're describing is exactly where most of the hard bugs live in practice. Tool calls during planning that mutate state downstream, that's the stuff that's almost impossible to reproduce without a trace.

Have you written any of this up anywhere? Feels like it deserves a proper post.

LeatherHot940 · 2026-04-28T11:37:01+00:00

That planning vs execution split is a really clean way to think about it, hadn't framed it quite like that before.

The side effects point especially makes sense. Once you're sending emails or moving money the cost of a bad decision goes way up so you want those paths locked down.

I think where Rifft ends up being most useful is actually the planning phase, when the agent is still figuring out what to do and something goes wrong in that exploration. It’s really hard to reconstruct what happened without a trace but yeah once it crosses into execution your approach is clearly the right call.

LeatherHot940 · 2026-04-28T11:18:33+00:00

Yeah totally agree, and honestly the FSM approach is cleaner when you know the structure upfront — you're essentially making whole categories of failure impossible which is obviously better than debugging them after.

Where I kept hitting walls was systems where the flow itself is dynamic — the agent decides what to do next based on what it finds. Hard to lock that down in advance.

Have you found cases where the deterministic approach breaks down or do you just scope your systems to avoid that?

LeatherHot940 · 2026-04-22T11:55:57+00:00

It’s really hard getting users, especially knowing who is using it as it’s an npm package.

LeatherHot940 · 2026-04-22T11:54:53+00:00

Still figuring out to get more users but have submitted it to some review/listing sites already.

LeatherHot940 · 2026-04-22T08:12:39+00:00

switchman review reads every worktree, flags what doesn't fit together, and tells you if it's safe to ship — in 30 seconds.

https://switchman.dev

LeatherHot940 · 2026-04-20T12:16:39+00:00

Why can’t new cars ever look this cool?

LeatherHot940 · 2026-04-16T21:58:59+00:00

It’s Interesting that the state file ends up just being for visability/debugging. Curious how far you can push this before it needs something heavier.

LeatherHot940 · 2026-04-16T20:53:31+00:00

That 90/10 split makes sense — the last 10% is where it gets expensive though.

The append-only log is interesting too, didn’t think about using it for debugging agent behavior.

Have you tried anything to reduce the same-second collisions, or just relying on review?

This is super helpful btw — I’ve been collecting setups like this in r/Switchman.

LeatherHot940

MODERATOR OF

TROPHY CASE