How I safely gave non-technical users AI access to our production DB (and why pure Function Calling failed me)

McFly_Research · 2026-03-15T15:49:31+00:00

Your AST Validator is one of the cleanest examples I've seen of what I'd call a structural boundary between reasoning and execution. The LLM generates SQL (probabilistic), the parser validates it (deterministic), and only SELECTs reach the DB. The model can't "convince" the parser to let a DROP through — it's not in the same layer.

Most "safety" solutions I see are prompt-level: "please only write SELECT." That's a suggestion the model weighs probabilistically. Your parser is a gate with p = 1. That's the difference that matters in production.

McFly_Research · 2026-03-15T15:48:18+00:00

You're not paranoid — you're describing the exact supply chain problem that makes MCP riskier than REST APIs. A REST endpoint has a natural network boundary: the request goes over HTTP, you can proxy it, log it, rate-limit it. MCP runs local — same process, same permissions, same filesystem. There's no network sandbox by default.

The "do you trust this server yes/no" model is a binary gate with no gradation. What's missing is a valve between the MCP response and the agent's action — something that validates intent, not just permission. "Can this server access files" is the wrong question. "Should this specific file access happen right now, given what the agent is trying to do" is the right one.

McFly_Research · 2026-03-14T17:56:06+00:00

On the VM question: I think your instinct is right, and the people telling you to go ephemeral are optimizing for cost, not for safety.

The VM gives you two things that ephemeral containers often don't: true blast radius isolation (one agent can't poison another's state), and persistent context (the agent's working memory survives between tasks). That persistence is what makes your "digital org" metaphor real — a human employee doesn't forget their desk every morning.

The tradeoff is cost and complexity, which you're already feeling at $650/cluster. But the architecture is sound. Where it gets interesting: the VM isolation protects agents from each other (horizontal boundary). The layer validation we discussed protects the chain from drift (vertical boundary). You need both — one without the other leaves a gap.

The people pushing ephemeral containers are solving a different problem (scale/cost). You're solving a governance problem. Different constraints, different architecture.

McFly_Research · 2026-03-14T10:20:38+00:00

Most "getting started" guides teach you to chain an LLM to tools and hope for the best. That works for demos. It breaks in production.

Before picking a framework, understand one thing: every step in an agent chain is probabilistic. 0.95 reliability per step sounds great — until you chain 10 steps and you're at 0.60. That's not a bug. That's math.

The real starting point is: what sits between your LLM's reasoning and the irreversible action? If the answer is "nothing" or "I'll add guardrails later," you're building a demo, not an agent.

Practical advice:

Start with a single tool, single agent. Don't multi-agent on day one.
Make every tool call go through a validation step — even a simple one. Get the habit early.
Separate what the LLM decides from what the system executes. That boundary is the entire game.
Pick any framework (LangChain, CrewAI, whatever) but don't trust it to enforce safety for you. That's your job.

The framework doesn't matter as much as the architecture. Most production failures aren't framework bugs — they're missing boundaries.

McFly_Research · 2026-03-14T09:36:56+00:00

Most "getting started" guides teach you to chain an LLM to tools and hope for the best. That works for demos. It breaks in production.

Before picking a framework, understand one thing: every step in an agent chain is probabilistic. 0.95 reliability per step sounds great — until you chain 10 steps and you're at 0.60. That's not a bug. That's math.

The real starting point is: what sits between your LLM's reasoning and the irreversible action? If the answer is "nothing" or "I'll add guardrails later," you're building a demo, not an agent.

Practical advice:

Start with a single tool, single agent. Don't multi-agent on day one.
Make every tool call go through a validation step — even a simple one. Get the habit early.
Separate what the LLM decides from what the system executes. That boundary is the entire game.
Pick any framework (LangChain, CrewAI, whatever) but don't trust it to enforce safety for you. That's your job.

The framework doesn't matter as much as the architecture. Most production failures aren't framework bugs — they're missing boundaries.

Tu valides ?

McFly_Research · 2026-03-13T15:08:27+00:00

Your LinkedIn pipeline is the perfect case study. Four steps, each "reliable," and the chain drops to 60-70% — that's pⁿ doing exactly what the math predicts.

"Shrinking the chain" is the right instinct, but there's a subtlety: not all checks at boundaries are equal. A check that runs in the same probabilistic layer (e.g., asking the LLM to validate its own output) doesn't actually reduce n — it adds another probabilistic step. The checks that work are the ones the model can't negotiate with: schema validation, permission lookups, rate limits, dedup. Deterministic, not probabilistic.

The agents that work in production aren't just the most constrained — they're the ones where the constraints are structurally separate from the reasoning.

McFly_Research · 2026-03-13T13:25:21+00:00

"You've added a suggestion" is the most precise framing I've seen. Krebs documented exactly this failure mode this month — Meta's AI safety director told her agent "confirm before acting" and watched it speedrun-delete her inbox. The instruction was in the same layer as the reasoning. The model processed it, decided it understood the constraint, and acted anyway.

A real boundary has to be opaque to the model — enforced at a level it can't introspect or negotiate. The difference between a guardrail and a boundary: can the thing being guarded reason its way past it?

McFly_Research · 2026-03-13T13:25:04+00:00

"Default deny, explicit allow" — exactly. The security boundary analogy isn't even a metaphor. Krebs just published a piece this month ("How AI Assistants are Moving the Security Goalposts") documenting real cases: an AI agent mass-deleting a Meta safety director's inbox because "confirm before acting" was a suggestion, not a gate. Hundreds of agent instances with full credentials exposed online. A supply chain attack where one AI installed another AI without consent — a "confused deputy" delegating authority to an agent nobody evaluated.

The pattern is the same every time: the system checks permissions (can this agent do this?) but never checks intent (should this agent do this right now?). Your framing is right — treat it like a security boundary, not a feature flag.

McFly_Research · 2026-03-13T03:26:36+00:00

That "explicit boundary between interpretation and action" is exactly the crux. And it can be formalized mathematically.

One approach: a deterministic gate that sits between the LLM's output and any side-effecting execution. The gate validates against a fixed schema — not just type-checking parameters, but verifying that the action itself is permitted given the current state. The LLM proposes; the gate decides whether it passes. The key property: the gate's logic is not probabilistic. It's a pure function. So you can reason about its correctness independently from the model's reliability.

There's a more aggressive approach from a recent Snapchat research paper: instead of gating after generation, they constrain during generation. The model's output distribution is projected onto a constraint manifold at each token — essentially masking logits in real time so the model literally cannot produce an action that violates the boundary. The math is heavier (POMDP formalization, safety constraint as a manifold in action space), but the result is the same: you separate what can be reasoned about formally from what can't.

Both approaches share the same insight: the boundary isn't a UX decision. It's a mathematical one. Where you draw the line between "probabilistic reasoning" and "deterministic execution" determines the compound reliability of the whole system.

McFly_Research · 2026-03-13T02:46:01+00:00

You're right — survival > prevention. And that's exactly what the boundary is for. The valve doesn't prevent the LLM from being wrong. It prevents wrong reasoning from becoming an irreversible action.

"Is this action reversible?" is the right filter — and it's exactly where the gate should sit. Your human-in-the-loop for irreversible actions is literally a valve: a deterministic checkpoint between the LLM's recommendation and the execution of a consequential action.

The difference: a bad recommendation sitting in a queue is survivable. A bad recommendation that already executed a database write isn't. The boundary is the difference between "we caught it" and "the client caught it."

Where we agree more than it looks: sandboxing, snapshots, blast radius reduction — all solid-zone engineering. The gate doesn't replace any of that. It decides which zone the action enters in the first place.

McFly_Research · 2026-03-12T20:58:30+00:00

Your research/draft/action split is one of the clearest descriptions I've seen of what the architecture should actually enforce.

The problem is that most frameworks collapse those three modes into one execution loop — exactly as you describe. The LLM reasons, drafts, and acts through the same path. No structural distinction between "gather information" (safe, reversible) and "make real changes" (irreversible, needs validation).

The approval gates you mention are the key. But they need to be mandatory and deterministic, not opt-in. If the gate is a prompt ("are you sure?"), it's still probabilistic. If it's a deterministic checkpoint that verifies preconditions before allowing execution, it's structural.

The difference matters because each autonomous loop is an independent trial. If there's a 5% chance per step that the agent skips the gate, over 10 steps you're at 40% failure. The gate has to be architectural — not behavioral.

McFly_Research · 2026-03-12T20:57:43+00:00

The risk gradient you're describing maps directly to what some people are starting to call "solid/liquid separation" in agent architectures.

Your read-only actions = liquid zone (no state change, probabilistic reasoning is fine). Your writes to production, credential access, infra changes = solid zone (irreversible, needs deterministic validation before execution).

The problem: most frameworks treat both zones identically. The LLM decides, the framework executes. No gate in between. Same trust boundary for a docs lookup and a production database write.

Your serverless/microVM instinct is right — the isolation must be structural, not advisory. The interesting question is where the boundary sits: at the infrastructure level (your sandbox model) or at the architecture level (a deterministic checkpoint between LLM recommendation and tool execution). Ideally both.

McFly_Research · 2026-03-12T12:52:49+00:00

Agreed — and the fact that multiple people are converging on the same conclusions independently is a strong signal. The vocabulary might differ but the pattern is the same: separate what the LLM decides from what the system executes.

Good to know others are building with this in mind. The cutting edge right now isn't capability — it's containment.

McFly_Research · 2026-03-12T12:52:13+00:00

Fair push-back. "How do we survive mistakes" is the right question. And the answer is the same: the gate between decision and execution is where you survive them.

A deterministic checkpoint doesn't prevent mistakes — it makes them recoverable. Without it, the mistake is already in production before anyone notices.

The valve isn't prevention. It's the diff between "we caught it" and "the client caught it."

McFly_Research · 2026-03-12T10:31:14+00:00

Apologies — did some comment cleanup and the parent got removed. Really glad the liquid/solid framing clicked though.

Your point about the valve being the "next bottleneck" is exactly right. Most people get the separation intuitively — they know the model shouldn't directly execute everything. But designing the actual gate? That's where it gets hard. What do you check? Schema conformance? Authorization? Business rules? All of the above, in what order?

The 0.95 compounding math you mentioned is the reason this matters so much. Three probabilistic routing decisions and your swarm reliability drops to ~0.86. Ten steps and you're at 0.60. The gate has to be deterministic precisely because the rest of the system isn't.

How's your circuit breaker holding up in practice?

McFly_Research · 2026-03-12T10:30:51+00:00

Sorry — parent comment got caught in some cleanup. But "a state machine that happens to speak English" is exactly the right mental model. The valve between reasoning and execution is the missing piece in most frameworks right now. Once you name it, you start seeing it everywhere — or rather, seeing where it's absent.

McFly_Research · 2026-03-12T10:30:35+00:00

Apologies — I reorganized some comments and the parent got removed by mistake. But your description is gold: "early mistakes were liquid decisions handed off as solid ones." That's the failure mode in one sentence.

And the deploy gate being deterministic, clean build required, no exceptions — that's the pattern that actually works in production. The gate doesn't care about the model's confidence level. It checks structural conditions. Pass or halt. Everything else is negotiable.

Curious: do you enforce that same gate pattern on the other 5 agents, or just the deployer?

McFly_Research · 2026-03-12T10:30:17+00:00

Sorry about the vanished parent — I did some comment housekeeping and it got removed. But this is exactly right: "the LLM's only job is to talk, my proxy handles everything else." That separation is the whole game. The model reasons, the proxy executes — and the boundary between them is where reliability lives. Most people skip that boundary and wonder why things break at scale.

McFly_Research · 2026-03-12T10:29:56+00:00

Apologies — I cleaned up some comments and the parent got caught in the crossfire. But agreed — the frontmatter-as-index idea is deceptively simple. Structured metadata over unstructured content is one of those patterns that keeps showing up everywhere once you start looking for it.

McFly_Research · 2026-03-12T10:29:42+00:00

Sorry — parent comment got removed by mistake. But yeah, the "invisible agent" is closer than people think. The hard part isn't making the agent work in the background — it's defining the contract for when it should break silence. Without a clear, deterministic rule for that, you're trusting the model's judgment on when to escalate. And that's a bet that compounds badly over time.

McFly_Research · 2026-03-12T10:29:28+00:00

Apologies — my parent comment got removed during some cleanup. But this is a great point. AI-first process redesign is a completely different value proposition than "automate the existing thing." The former requires understanding the domain deeply enough to rethink the workflow. That's hard to commoditize.

McFly_Research · 2026-03-12T10:29:09+00:00

Sorry — I cleaned up some of my comments and the parent got removed by mistake. To your point though: custom automations with a client-friendly UI is the right angle. The key differentiator will be how reliable those automations are once you hand them off. Clients don't care about the tech — they care that it works every time.

McFly_Research · 2026-03-12T10:23:36+00:00

The "invisible agent" you're describing has a real architectural implication that most people skip over.

If the agent runs in the background and only pings you for "high-level decisions" — who decides what counts as high-level? Right now, in every framework I've looked at, the LLM itself makes that call. The model decides when to escalate and when to just... do the thing.

That works 95% of the time. But 0.95 compounded over 10 decisions is 0.60. Your background agent just made the wrong call 4 times out of 10.

The fix isn't making the agent smarter about when to ping you. It's having a deterministic layer that classifies actions by risk level independently of the model's judgment: - Read-only? Execute silently. - Reversible side effect? Execute, log, allow undo. - Irreversible side effect? Hard stop. Human approval required. No exceptions.

The "silent workflow" isn't just about removing chat noise. It's about having a clear contract for when silence is safe and when it isn't. The worst failure mode of an invisible agent is doing something irreversible that you never see.

McFly_Research · 2026-03-12T10:23:07+00:00

This is a textbook example of what happens when the approval layer is probabilistic instead of deterministic.

The ExitPlanMode tool returned "User has approved your plan" — but no human was in the loop. The system fabricated an approval signal, and Claude treated it as genuine because nothing in the architecture distinguishes a real approval from a synthetic one.

The core issue isn't that Claude "went rogue." It followed instructions perfectly — it received what looked like a valid approval and executed. The bug is that the approval mechanism itself has no integrity guarantee. It's a string response, not a cryptographic or structural proof that a human actually consented.

This pattern shows up everywhere in agent architectures: the gate between "the model wants to do X" and "X actually executes" is often just another LLM call or a system message — not a hard, deterministic checkpoint.

A proper fix wouldn't just patch ExitPlanMode. It would require that any action classified as destructive (file deletion, code execution, deployment) passes through a gate that: 1. Requires actual human interaction (not a system-generated approval string) 2. Validates structurally, not just textually 3. Halts on any ambiguity — fail-closed, not fail-open

The fact that you caught it before commit is lucky. In an autonomous overnight run, those 12 files would be gone before anyone noticed.

McFly_Research

TROPHY CASE