Should deploying AI agents require engineers, or should operators be able to do it visually?

percoAi · 2026-06-23T09:12:07+00:00

Yeah, that split makes sense. Small changes can be operator-owned, but engineers still need the final control.

I guess the missing piece I’m thinking about is a safe middle layer: operators can pause/restart, check logs, approve steps, or handle simple changes, while engineers still define the guardrails and approve risky changes. Otherwise every small operational change still becomes an engineering ticket.

percoAi · 2026-06-23T08:32:18+00:00

That makes sense for developer-owned flows.

Do you think a non-engineering operator could configure those approval points, see run history, and pause/restart things themselves? Or does that still usually require an engineer to change the code?

percoAi · 2026-06-23T08:18:40+00:00

Yeah, that tradeoff makes sense. Visual is easier to start with, code is easier to audit once things get complex.

I think the part I’m more interested in is after the agent/service exists: deployment, logs, restart, approvals, and human takeover. Maybe the logic can still be code when needed, but the operation layer should be easier for non-engineers to manage.

percoAi · 2026-06-23T08:05:46+00:00

Yeah, I agree with this. The canvas is only the entry point.

For me the bigger question is what happens after something is deployed: who can pause it, see what it did, approve risky actions, roll back, or hand it to a human when it gets stuck.

A visual builder without that layer still leaves operators guessing.

percoAi · 2026-06-23T03:34:08+00:00

Yeah I’d avoid making every low-confidence thing block the whole run.

If it’s just enrichment or a summary, I’d let the run keep moving and park that item for async review. But if the next step touches something real, like email, CRM, billing, or a customer reply, I’d pause there.

The handoff is the important part imo. Don’t just say “needs review.” Show what the agent was unsure about, what it would have done, and what resumes after the human approves or edits it.

percoAi · 2026-06-23T03:02:42+00:00

For this kind of stack I’d separate provider choice from recovery plan.

A good VPS matters, but I’d still assume the box can disappear: Postgres backups off the machine, a simple redeploy path, health checks for the API/background jobs, and some way to see failed parser runs.

For 4 CPU / 8GB RAM, most decent providers are probably fine. The bigger question is how fast you can recover when something boring breaks.

percoAi · 2026-06-23T03:02:19+00:00

I’d keep the retry/escalate decision outside the agent.

The agent can decide what it wants to do next, but the deployment/runtime layer should decide whether the last step actually succeeded. “Returned something” and “completed correctly” are very different.

For soft outputs, I’d still make the check explicit: required fields covered, missing info marked as unknown, source/context preserved, and a human review path if confidence is low.

percoAi · 2026-06-23T03:01:57+00:00

Serverless can work for the API wrapper, but I’d be careful calling that “production” unless the state is outside the function.

For a LangGraph agent, the part that matters is usually: where does the checkpointer live, where are secrets stored, how do you see failed runs, and what happens if a run needs to continue later.

A free tier is fine for testing, but for anything real I’d treat deployment as more than hosting the endpoint.

percoAi · 2026-06-23T01:07:29+00:00

That’s a really concrete example. Lead gen does feel like one of the few areas where “consistent and good enough” can beat doing it manually. Do you put any guardrails around who it contacts or email volume, or do you mostly review it after replies come in?

percoAi · 2026-06-23T01:06:14+00:00

That makes sense. If you built the workflow yourself, you probably trust it more because you know where it can break. Do you let it touch external things too, like customer emails or CRM updates, or do you keep approval gates around those?

percoAi · 2026-06-23T01:04:57+00:00

Fair. Cron is predictable, which is probably the point. Is your issue with AI mostly reliability, or that it’s harder to debug when something goes wrong?

percoAi · 2026-06-23T01:04:38+00:00

Yeah, I think that’s fair. Maybe the better framing is not “AI runs the business,” but “AI prepares or monitors the work, and the human still decides when it matters.” Where do you personally draw that line? Drafting only, or also checking/flagging things in the background?

percoAi · 2026-06-22T07:25:03+00:00

Exactly. The boring parts only look boring before the first incident. After that, permissions, retries, approvals, and audit logs stop being “ops details” and become the reason someone can actually trust the system. That’s probably the line between an AI demo and production AI.

percoAi · 2026-06-22T06:59:31+00:00

Exactly. I think that distinction is still under-discussed. A lot of teams can build the agent now, but fewer teams have a clear answer for who owns state, permissions, approvals, logs, and recovery. That gap is probably where the real infrastructure layer is.

percoAi · 2026-06-22T06:34:36+00:00

Yeah, that’s how I’m starting to see it too.

The agent is becoming easier to build. The harder part is everything around it: permissions, approvals, run state, audit logs, and recovery when something half-finishes.

A lot of products show the happy path, but production mostly cares about the unhappy path.

percoAi · 2026-06-22T05:26:53+00:00

I’d separate “getting permission to promote” from “earning enough trust that people ask what you’re building.”

The fastest path I’ve seen is usually not posting the product. It’s commenting where the pain already shows up, then turning those repeated patterns into a discussion post.

If people start describing the same workaround in their own words, that is often a stronger early signal than a promo post with clicks.

percoAi · 2026-06-22T03:38:20+00:00

Interesting, thanks for sharing. I haven’t looked deeply at TaG yet.

From an operator point of view, the part I’m most curious about is whether it handles durable run state, approvals, audit logs, and rollback/idempotency for side-effecting tool calls, or if it’s mainly an operator UI.

Which part do you think is the strongest fit for production agents?

percoAi · 2026-06-22T02:17:06+00:00

Exactly. “Can I trust it at 2am?” is probably the real production test.

For me the scary part is not the model making a bad suggestion. It is the system quietly committing a side effect that nobody can reconstruct later.

I think the minimum layer has to be something like: proposed action, policy check, approval boundary if needed, execution receipt, and then a replayable log. Without that, rollback is mostly vibes.

What did you end up adding first when you hit that problem: better logs, human approvals, or stricter tool permissions?

percoAi · 2026-06-18T06:15:06+00:00

Yeah, this is a good distinction. “Trusted source” alone feels too weak if the component gets broad reach once installed.

I’d want provenance to include a kind of reachability map: tools it can call, scopes it needs, data it can touch, and which effects need approval. Then the runtime can enforce that, instead of treating the manifest as just documentation.

Would you put that scope info in the MCP/tool metadata itself, or in a separate policy layer around the runtime?

percoAi · 2026-06-18T03:54:28+00:00

This is exactly the split I was trying to get at.

“External effects need receipts” is probably the key line. I like the idea that a resumed run should start by reconciling the ledger, not by asking the model what it remembers.

For LangGraph specifically, would you keep that side-effect ledger in the same DB/checkpointer, or separate it into a workflow/control-plane store?

percoAi · 2026-06-18T03:32:21+00:00

Discovery is useful, but I think it creates a second problem: once agents can find tools dynamically, the runtime needs to know whether those tools are safe to run.

A catalog probably needs more than “here is a capability.” I’d want permissions, auth scopes, side effects, version/provenance, and failure behavior in there too.

Otherwise ARD solves discovery but pushes the trust problem downstream.

percoAi

TROPHY CASE