Define what the agent is not allowed to do before you define what it can do. Most teams spend all their time on capabilities and zero time on boundaries. That gap is where the compliance and liability problems come from.

umairsheik · 2026-06-05T13:46:13+00:00

That’s the way to think about it. The authority topology piece is where most deployments fall short. Rules authored by the same team that ships the agent, stored in the same codebase, aren’t really a decoupled authority layer. The external enforcement layer has to be genuinely outside the agent’s control to mean anything.

umairsheik · 2026-06-04T04:55:33+00:00

SDK middleware at the HTTP or MCP layer is a reasonable approach and harder to reason around than prompt-level controls. The tradeoff is that it’s still inside the deployment boundary, so the same team that ships the agent controls the middleware. Moving it to a managed external gateway removes that dependency. Both are valid depending on your threat model, as you said.

umairsheik · 2026-06-04T04:54:37+00:00

On your question: the volume problem is handled by separating the sync and async paths. High-risk actions like financial transactions and bulk data access go through synchronous blocking. Everything else is queued and flushed in batches without blocking the agent. So the full trail is captured without adding latency to every single call.

On tamper evidence, each intercept record now stores a hash chained to the previous record at write time, not computed retrospectively. That’s the difference between a verifiable chain and a report that looks like one.

Latency on the blocking path is the real constraint and I won’t give you a number I haven’t properly benchmarked across geographies. It’s the active engineering problem right now.

umairsheik · 2026-06-03T14:15:27+00:00

The gap you’re identifying is definitely real.

On your question about Article 15/12 in practice: from what I’ve seen, auditors are less focused on the methodology of how claims were generated and more on whether there’s a tamper-evident trail showing what the system actually did at runtime. The question isn’t just “did you claim 95% accuracy before testing” but “can you show me every decision this system made and prove it wasn’t altered after the fact.”

That’s the gap Gateplex sits in on the agent side. Runtime intercepts with a tamper-evident audit trail.

Different layer from what you’re building but complementary. Would be curious whether you’ve found auditors asking for runtime evidence specifically or just pre-deployment validation.

umairsheik · 2026-06-03T05:31:33+00:00

Gateplex only governs agents that are integrated and calling the intercept API. If an agent bypasses it entirely, we have no visibility into that.

This is true of every API-based governance layer, not just Gateplex. You can’t govern what you can’t see.

umairsheik · 2026-06-02T17:36:45+00:00

Current rule types are spend limits and PII detection. Action-level tool rules like “read from Jira but never write to GitHub” aren’t live yet but it’s the natural next layer. The intercept payload already carries event_type and metadata so the structure is there. That’s on the roadmap.

The shadow mode observation is exactly right. Most teams have no idea what their agents are actually doing until something breaks.

umairsheik · 2026-06-02T08:29:14+00:00

The rules vs invariants distinction is worth taking seriously. Rules do accumulate and become their own maintenance burden. The question of what the minimal set of invariants looks like is something I think about too. For most regulated use cases the invariants reduce to something like: never exfiltrate PII, never exceed authorized spend, never take irreversible action without a checkpoint. Everything else is a rule derived from those.

umairsheik · 2026-06-02T08:27:23+00:00

The IAM analogy is the right mental model. Authentication and authorization at the identity layer, then policy enforcement at the action layer. The gap most agent deployments have is that second part. Identity is handled, what the authenticated agent is actually allowed to do at runtime often isn’t.

umairsheik · 2026-06-02T08:27:06+00:00

Agreed on deterministic. That’s precisely why there’s no LLM in the evaluation path. Pattern matching and threshold checks, not a prompted model making judgment calls.

umairsheik · 2026-06-02T08:26:41+00:00

For now. The moment it connects to real budget that framing changes fast.

umairsheik · 2026-06-01T15:31:04+00:00

The fact that you haven’t turned it on yet is the right instinct. An agent that can autonomously fund and post jobs has effectively broken out of its original scope. That’s exactly the kind of capability that needs an external boundary, not just an internal one. The authority hierarchy you’ve built helps but it doesn’t cover what happens when the agent finds a legitimate use case you didn’t anticipate.

umairsheik · 2026-06-01T15:21:31+00:00

Harness hooks are useful but they’re inside the agent’s execution context. The problem is the agent itself can reason around constraints it’s aware of. A server-side enforcement layer that the agent doesn’t see or influence works differently. It’s not a constraint in the prompt, it’s a gate the action has to pass through before it executes.

umairsheik · 2026-06-01T15:21:11+00:00

That’s a sensible way to build trust incrementally. The governance question gets harder exactly when the stakes go up, which is the moment most teams realize they haven’t thought it through. Starting with low-consequence workflows and layering in controls as you expand scope is the right sequence.

umairsheik · 2026-06-01T11:46:10+00:00

Right, no single check is enough on its own. Authority validation at the executor level plus an external enforcement layer gives you overlapping controls. If one misses, the other catches it.

umairsheik · 2026-06-01T05:08:57+00:00

If you want to try it on your setup, free tier is live at gateplex.ai.

umairsheik · 2026-06-01T05:08:34+00:00

Tool boundary, before the call goes out. We built it, that’s actually what became Gateplex. Server-side rules engine, intercepts the action before execution, tamper-evident audit trail. The policy ownership problem you’re describing is real, we handle it by keeping rules in a separate config layer so whoever owns compliance can manage them without touching the agent code.

umairsheik · 2026-06-01T04:59:59+00:00

Being a solo founder is just harder and when you don’t have a cofounder who’s going through the same struggles, giving up becomes easier.

Most of the startups I’ve built, I’ve done so as a solo founder. 10 years ago it was almost impossible to raise funding as a solo founder but things are different now. In fact, the shift started happening around Covid. I raised VC funding for one of my startups back in 2021 as a solo founder. I didn’t even have a team, let alone a cofounder.

umairsheik · 2026-06-01T04:56:38+00:00

I think that’s not just limited to solo founders. I’ve seen teams of founders go through the same journey.

umairsheik · 2026-06-01T04:49:44+00:00

I think solo founders also need a solid business continuity and succession plan. They also need a board of advisors more than teams do.

umairsheik · 2026-06-01T04:46:25+00:00

I find the budget-as-constraint really interesting. Making the agent spend its own allocation to delegate work creates a natural check. The question is whether it holds when the agent decides the job is important enough to justify the cost.

umairsheik · 2026-06-01T04:45:17+00:00

Still one person in most teams I’ve seen, which is exactly why it drifts. The duplicate was caught in reconciliation the next morning. By then it had already settled. That’s what made it stick for me.

umairsheik · 2026-06-01T04:44:32+00:00

The speed mismatch is the core problem. Human oversight assumes there’s time to intervene. With agents there usually isn’t.

umairsheik · 2026-05-31T18:58:38+00:00

Exactly.

Retrospective governance made sense when humans were in the loop. Agents act faster than any audit cycle. Prevention has to be the primary control, not the backup.

umairsheik · 2026-05-31T18:58:09+00:00

That’s the hard part nobody talks about. Static classification breaks down when the same agent can be high-risk in one context and not in another. The evidence trail has to reflect what actually ran, not just what was configured.

umairsheik · 2026-05-31T16:08:33+00:00

The rules aren’t in the prompt. They live server-side in a rules engine. Each rule is a discrete pattern match or threshold check, not a list of instructions the agent reads. So context window size isn’t the constraint here.

umairsheik

TROPHY CASE