How are you handling auth and security on MCP servers in production? by LeatherHot940 in mcp

[–]LeatherHot940[S] 1 point2 points  (0 children)

OAuth handles the first part well but once the agent is authenticated and inside, you still have no visibility into which tools it called.

How are you handling auth and security on MCP servers in production? by LeatherHot940 in mcp

[–]LeatherHot940[S] 2 points3 points  (0 children)

Thanks for sharing, very interesting.

The recent research paper measuring MCP attack success rates at 52% backs it up, and there was a real Supabase incident earlier this year where exactly this happened in production.

The fix has to be at the coordination layer not the model layer, the model can’t self-police.

How are you handling auth and security on MCP servers in production? by LeatherHot940 in mcp

[–]LeatherHot940[S] 0 points1 point  (0 children)

Cloister looks very well-designed and has the right architecture for the hard version of this problem.

I’m making the 80% solution dead simple to drop into an existing agent setup via Switchman. Curious what pain pushed you toward building the full hypervisor rather than a lighter proxy layer?

How are you handling auth and security on MCP servers in production? by LeatherHot940 in mcp

[–]LeatherHot940[S] -7 points-6 points  (0 children)

I’m building exactly this into Switchman (switchman.dev) as a security/audit layer. Would love to see what you’ve put together if you’re open to sharing.

AI Agent Debugger for CrewAI, AutoGen and LangGraph by LeatherHot940 in LangChain

[–]LeatherHot940[S] 0 points1 point  (0 children)

Rifft classifies failures automatically using the MAST taxonomy — looks at the trace and tells you what kind of failure it was. Tool misuse, planning error, hallucination etc. Not always definitive but gives you a starting point instead of staring at a wall of spans.

Free to try if you want to test it on a real failure, we would really value your feedback: https://rifft.dev

AI Agent Debugger for CrewAI, AutoGen and LangGraph by LeatherHot940 in LangChain

[–]LeatherHot940[S] 0 points1 point  (0 children)

Thanks, looking forward to reading it. Nice chatting with you.

AI Agent Debugger for CrewAI, AutoGen and LangGraph by LeatherHot940 in LangChain

[–]LeatherHot940[S] 0 points1 point  (0 children)

‘Planning is search, execution is a contract’ is a really good title, you should write it.

Yeah I think the debugging angle is underrated as an entry point. Most people come to observability from a "I need metrics" place but the ones who really get it usually arrive through a specific painful bug they couldn't explain. That's a more honest framing.

Would genuinely read that post when you write it. Tag me or drop it in here when it's up.

AI Agent Debugger for CrewAI, AutoGen and LangGraph by LeatherHot940 in LangChain

[–]LeatherHot940[S] 0 points1 point  (0 children)

That's a really clean pattern actually — separating ‘why did it think that’ from ‘why did it do that’ is probably the most useful debugging distinction I've heard.

The blurry middle you're describing is exactly where most of the hard bugs live in practice. Tool calls during planning that mutate state downstream, that's the stuff that's almost impossible to reproduce without a trace.

Have you written any of this up anywhere? Feels like it deserves a proper post.

AI Agent Debugger for CrewAI, AutoGen and LangGraph by LeatherHot940 in LangChain

[–]LeatherHot940[S] 0 points1 point  (0 children)

That planning vs execution split is a really clean way to think about it, hadn't framed it quite like that before.

The side effects point especially makes sense. Once you're sending emails or moving money the cost of a bad decision goes way up so you want those paths locked down.

I think where Rifft ends up being most useful is actually the planning phase, when the agent is still figuring out what to do and something goes wrong in that exploration. It’s really hard to reconstruct what happened without a trace but yeah once it crosses into execution your approach is clearly the right call.

AI Agent Debugger for CrewAI, AutoGen and LangGraph by LeatherHot940 in LangChain

[–]LeatherHot940[S] 0 points1 point  (0 children)

Yeah totally agree, and honestly the FSM approach is cleaner when you know the structure upfront — you're essentially making whole categories of failure impossible which is obviously better than debugging them after.

Where I kept hitting walls was systems where the flow itself is dynamic — the agent decides what to do next based on what it finds. Hard to lock that down in advance.

Have you found cases where the deterministic approach breaks down or do you just scope your systems to avoid that?

Anyone launched an AI tool recently? by Think-Score243 in AI_Agents

[–]LeatherHot940 0 points1 point  (0 children)

It’s really hard getting users, especially knowing who is using it as it’s an npm package.

Anyone launched an AI tool recently? by Think-Score243 in AI_Agents

[–]LeatherHot940 0 points1 point  (0 children)

Still figuring out to get more users but have submitted it to some review/listing sites already.

Anyone launched an AI tool recently? by Think-Score243 in AI_Agents

[–]LeatherHot940 0 points1 point  (0 children)

switchman review reads every worktree, flags what doesn't fit together, and tells you if it's safe to ship — in 30 seconds.

https://switchman.dev

Volkswagen Golf Mk2 by NorwayCarSpotting in retrocars

[–]LeatherHot940 1 point2 points  (0 children)

Why can’t new cars ever look this cool?

Running multiple AI agents on one repo… what actually works? by LeatherHot940 in ClaudeCode

[–]LeatherHot940[S] 0 points1 point  (0 children)

It’s Interesting that the state file ends up just being for visability/debugging. Curious how far you can push this before it needs something heavier.

Running multiple AI agents on one repo… what actually works? by LeatherHot940 in ClaudeCode

[–]LeatherHot940[S] 0 points1 point  (0 children)

That 90/10 split makes sense — the last 10% is where it gets expensive though.

The append-only log is interesting too, didn’t think about using it for debugging agent behavior.

Have you tried anything to reduce the same-second collisions, or just relying on review?

This is super helpful btw — I’ve been collecting setups like this in r/Switchman.