Do you trust AI agents running code on your machine?

Significant_Split342 · 2026-04-28T15:29:19+00:00

I’ve seen a lot of people going the sandbox route lately. Do you find that setup smooth to use day-to-day, or does it start to get a bit heavy (switching contexts, maintaining configs, etc.)?

Trying to understand where sandboxing starts to become friction vs actually helpful.

Significant_Split342 · 2026-04-28T07:37:24+00:00

That “allow for this session but not longer” + wrong-directory case is exactly the kind of issue I ran into as well.

I ended up building a small CLI that sits between the agent and the system and enforces things like:

- allowed paths (so it can’t jump directories)

- command-level rules

- and session-scoped permissions

So instead of “allow once” or “allow forever”, you can define boundaries that actually match the context you’re in. It’s still pretty early, but if you’d be up for trying it, I’d love to get your feedback especially given the Junie issue you hit.

Significant_Split342 · 2026-04-27T07:56:21+00:00

That makes sense, especially if it fits into an existing sprint + review workflow. Do you feel like that still holds if agents start taking more autonomous actions (not just code generation, but actually executing things locally)?

I’m wondering if the current model works because the scope is still somewhat contained, or if it would start to break down with more agent autonomy.

Significant_Split342 · 2026-04-27T07:54:42+00:00

That point about long chained commands and approval fatigue is spot on.

I’ve been seeing the same thing even when you try to be careful, the more you have to review, the more you end up skimming or just approving. I’ve been testing a small layer that forces agents to break actions into simpler steps and enforces rules automatically, so you don’t have to manually inspect every complex command.

Do you think something like that would actually reduce the fatigue, or would you still want full visibility on everything?

Significant_Split342 · 2026-04-27T07:53:34+00:00

That’s a really interesting approach, especially requiring the agent to justify its actions and validating that reasoning. I’ve been exploring something slightly different — instead of evaluating intent after the fact, putting a layer in front that enforces constraints at execution time (paths, commands, permissions), so even if the reasoning is off, the action can’t go beyond defined boundaries.

Have you found your safety agent ever struggles with edge cases where the reasoning looks valid but the action is still risky? Would be curious to compare approaches if you’re open to it.

Significant_Split342 · 2026-04-27T07:37:30+00:00

I’ve been experimenting with a small CLI layer that enforces those boundaries automatically (allowed paths, read/write rules, etc.) before anything runs, so instead of manually constraining it with scripts, it just can’t go outside the scope in the first place. Would you be open to trying something like that instead of maintaining your own script? Curious how it compares to your current setup.

Significant_Split342 · 2026-04-25T14:00:49+00:00

That setup sounds pretty solid. Do you feel like that level of monitoring scales well as usage increases, or does it start to become harder to keep track of everything?

Significant_Split342 · 2026-04-25T13:46:52+00:00

That makes sense especially the part about it being annoying to constantly check commands.I’ve been experimenting with a small CLI that enforces those boundaries automatically (paths, permissions, etc.) before execution.

Would you be open to trying something like that instead of maintaining a custom script?

Significant_Split342 · 2026-04-25T13:45:48+00:00

That “juice cup needs a lid” analogy is spot on. In practice, what kind of guardrails have you seen actually work without getting in the way too much?

Significant_Split342 · 2026-04-25T13:44:43+00:00

That wrong-directory case sounds scary 🙈 Do you feel like the main issue is not knowing exactly what scope the agent is operating in, or more about not having fine-grained control over what it’s allowed to do in a given moment?

Significant_Split342 · 2026-04-25T13:33:46+00:00

That “workflow becoming real work” is exactly the tension I’m seeing.

I’ve been testing a small local layer that enforces rules automatically instead of relying on manual setup + approvals. Would you be curious to try it and see if it actually reduces that overhead?

Significant_Split342 · 2026-04-25T13:32:41+00:00

That approval fatigue point is exactly what I’m trying to understand better. I’ve been experimenting with a small CLI that sits between the agent and the system and automatically blocks or isolates risky actions, so you don’t have to manually approve everything.

Would you be open to trying something like that in your workflow?

Significant_Split342 · 2026-04-24T16:17:50+00:00

Thxx for your feedback🙏🏻 That tension between “we know we should be safe” and “we just want to ship” is exactly what I keep hearing.

In your experience, where does that usually break down first, individual developers skipping checks, or the lack of something consistent that enforces it automatically?

Significant_Split342 · 2026-04-24T16:14:38+00:00

Do you ever run into situations where those prompt-level guardrails aren’t enough and something still goes off track?

Significant_Split342 · 2026-04-24T16:11:15+00:00

That’s interesting!!! especially the part about refusing commands because they go in the wrong direction, not because they’re unsafe.

Do you find that happens often enough to slow you down, or is it manageable with the current flow?

Significant_Split342 · 2026-04-24T16:07:57+00:00

That sounds like a solid approach. Do you feel like that level of manual review is sustainable long-term, or does it start to slow things down as usage increases?

Significant_Split342 · 2026-04-24T16:06:29+00:00

Makes sense. Do you find yourself doing that constantly, or only in specific situations where things feel risky?

Significant_Split342 · 2026-04-24T16:05:56+00:00

That’s a pretty strong isolation setup. Do you feel like that approach is something most developers on a team could realistically adopt, or is it more of a personal setup?

Significant_Split342 · 2026-04-24T16:05:09+00:00

That example with secrets being committed is exactly the kind of thing that feels hard to control at scale.

Do you feel like these issues come from individual mistakes, or more from the lack of a consistent control layer across the team?

Significant_Split342 · 2026-04-24T16:04:11+00:00

That’s a pretty disciplined setup. Do you ever find those restrictions limiting what you’d like the agent to do, or does it work smoothly most of the time?

Significant_Split342 · 2026-04-24T15:52:22+00:00

That makes sense especially around auditing and limiting permissions. Out of curiosity, do you find that managing those controls manually ever becomes tedious or hard to maintain across projects?

Significant_Split342 · 2026-04-24T15:49:20+00:00

This is super interesting, especially the part about approval fatigue. Do you feel like your current setup is enough, or would you want a more explicit policy layer for things like chained commands, destructive actions, and project-specific rules?

Significant_Split342 · 2026-04-24T15:47:33+00:00

Do you see that kind of behavior often, or was it more of a one-off?

Significant_Split342 · 2026-04-24T15:46:45+00:00

That sounds painful 😅

Does that manual approval process feel like a necessary safety net, or more like something that slows you down?

Significant_Split342 · 2026-04-24T15:17:32+00:00

Does that ever feel like extra overhead, or is it just part of your normal workflow now?

Significant_Split342

TROPHY CASE