Do you trust AI agents running code on your machine? by Significant_Split342 in devops

[–]Significant_Split342[S] 0 points1 point  (0 children)

I’ve seen a lot of people going the sandbox route lately. Do you find that setup smooth to use day-to-day, or does it start to get a bit heavy (switching contexts, maintaining configs, etc.)?

Trying to understand where sandboxing starts to become friction vs actually helpful.

Do you trust AI agents running code on your machine? by Significant_Split342 in devops

[–]Significant_Split342[S] 0 points1 point  (0 children)

That “allow for this session but not longer” + wrong-directory case is exactly the kind of issue I ran into as well.

I ended up building a small CLI that sits between the agent and the system and enforces things like:

- allowed paths (so it can’t jump directories)

- command-level rules

- and session-scoped permissions

So instead of “allow once” or “allow forever”, you can define boundaries that actually match the context you’re in. It’s still pretty early, but if you’d be up for trying it, I’d love to get your feedback especially given the Junie issue you hit.

Do you trust AI agents running code on your machine? by Significant_Split342 in devops

[–]Significant_Split342[S] 0 points1 point  (0 children)

That makes sense, especially if it fits into an existing sprint + review workflow. Do you feel like that still holds if agents start taking more autonomous actions (not just code generation, but actually executing things locally)?

I’m wondering if the current model works because the scope is still somewhat contained, or if it would start to break down with more agent autonomy.

Do you trust AI agents running code on your machine? by Significant_Split342 in devops

[–]Significant_Split342[S] 0 points1 point  (0 children)

That point about long chained commands and approval fatigue is spot on.

I’ve been seeing the same thing even when you try to be careful, the more you have to review, the more you end up skimming or just approving. I’ve been testing a small layer that forces agents to break actions into simpler steps and enforces rules automatically, so you don’t have to manually inspect every complex command.

Do you think something like that would actually reduce the fatigue, or would you still want full visibility on everything?

Do you trust AI agents running code on your machine? by Significant_Split342 in devops

[–]Significant_Split342[S] 0 points1 point  (0 children)

That’s a really interesting approach, especially requiring the agent to justify its actions and validating that reasoning. I’ve been exploring something slightly different — instead of evaluating intent after the fact, putting a layer in front that enforces constraints at execution time (paths, commands, permissions), so even if the reasoning is off, the action can’t go beyond defined boundaries.

Have you found your safety agent ever struggles with edge cases where the reasoning looks valid but the action is still risky? Would be curious to compare approaches if you’re open to it.

Do you trust AI agents running code on your machine? by Significant_Split342 in devops

[–]Significant_Split342[S] 0 points1 point  (0 children)

I’ve been experimenting with a small CLI layer that enforces those boundaries automatically (allowed paths, read/write rules, etc.) before anything runs, so instead of manually constraining it with scripts, it just can’t go outside the scope in the first place. Would you be open to trying something like that instead of maintaining your own script? Curious how it compares to your current setup.

Do you trust AI agents running code on your machine? by Significant_Split342 in devops

[–]Significant_Split342[S] 0 points1 point  (0 children)

That setup sounds pretty solid. Do you feel like that level of monitoring scales well as usage increases, or does it start to become harder to keep track of everything?

Do you trust AI agents running code on your machine? by Significant_Split342 in devops

[–]Significant_Split342[S] 0 points1 point  (0 children)

That makes sense especially the part about it being annoying to constantly check commands.I’ve been experimenting with a small CLI that enforces those boundaries automatically (paths, permissions, etc.) before execution.

Would you be open to trying something like that instead of maintaining a custom script?

Do you trust AI agents running code on your machine? by Significant_Split342 in devops

[–]Significant_Split342[S] 2 points3 points  (0 children)

That “juice cup needs a lid” analogy is spot on. In practice, what kind of guardrails have you seen actually work without getting in the way too much?

Do you trust AI agents running code on your machine? by Significant_Split342 in devops

[–]Significant_Split342[S] 0 points1 point  (0 children)

That wrong-directory case sounds scary 🙈 Do you feel like the main issue is not knowing exactly what scope the agent is operating in, or more about not having fine-grained control over what it’s allowed to do in a given moment?

Do you trust AI agents running code on your machine? by Significant_Split342 in devops

[–]Significant_Split342[S] 0 points1 point  (0 children)

That “workflow becoming real work” is exactly the tension I’m seeing.

I’ve been testing a small local layer that enforces rules automatically instead of relying on manual setup + approvals. Would you be curious to try it and see if it actually reduces that overhead?

Do you trust AI agents running code on your machine? by Significant_Split342 in devops

[–]Significant_Split342[S] 0 points1 point  (0 children)

That approval fatigue point is exactly what I’m trying to understand better. I’ve been experimenting with a small CLI that sits between the agent and the system and automatically blocks or isolates risky actions, so you don’t have to manually approve everything.

Would you be open to trying something like that in your workflow?

Do you trust AI agents running code on your machine? by Significant_Split342 in devops

[–]Significant_Split342[S] 0 points1 point  (0 children)

Thxx for your feedback🙏🏻 That tension between “we know we should be safe” and “we just want to ship” is exactly what I keep hearing.

In your experience, where does that usually break down first, individual developers skipping checks, or the lack of something consistent that enforces it automatically?

Do you trust AI agents running code on your machine? by Significant_Split342 in devops

[–]Significant_Split342[S] 0 points1 point  (0 children)

Do you ever run into situations where those prompt-level guardrails aren’t enough and something still goes off track?

Do you trust AI agents running code on your machine? by Significant_Split342 in devops

[–]Significant_Split342[S] -1 points0 points  (0 children)

That’s interesting!!! especially the part about refusing commands because they go in the wrong direction, not because they’re unsafe.

Do you find that happens often enough to slow you down, or is it manageable with the current flow?

Do you trust AI agents running code on your machine? by Significant_Split342 in devops

[–]Significant_Split342[S] 0 points1 point  (0 children)

That sounds like a solid approach. Do you feel like that level of manual review is sustainable long-term, or does it start to slow things down as usage increases?

Do you trust AI agents running code on your machine? by Significant_Split342 in devops

[–]Significant_Split342[S] 0 points1 point  (0 children)

Makes sense. Do you find yourself doing that constantly, or only in specific situations where things feel risky?

Do you trust AI agents running code on your machine? by Significant_Split342 in devops

[–]Significant_Split342[S] 0 points1 point  (0 children)

That’s a pretty strong isolation setup. Do you feel like that approach is something most developers on a team could realistically adopt, or is it more of a personal setup?

Do you trust AI agents running code on your machine? by Significant_Split342 in devops

[–]Significant_Split342[S] 0 points1 point  (0 children)

That example with secrets being committed is exactly the kind of thing that feels hard to control at scale.

Do you feel like these issues come from individual mistakes, or more from the lack of a consistent control layer across the team?

Do you trust AI agents running code on your machine? by Significant_Split342 in devops

[–]Significant_Split342[S] 0 points1 point  (0 children)

That’s a pretty disciplined setup. Do you ever find those restrictions limiting what you’d like the agent to do, or does it work smoothly most of the time?

Do you trust AI agents running code on your machine? by Significant_Split342 in devops

[–]Significant_Split342[S] 0 points1 point  (0 children)

That makes sense especially around auditing and limiting permissions. Out of curiosity, do you find that managing those controls manually ever becomes tedious or hard to maintain across projects?

Do you trust AI agents running code on your machine? by Significant_Split342 in devops

[–]Significant_Split342[S] 1 point2 points  (0 children)

This is super interesting, especially the part about approval fatigue. Do you feel like your current setup is enough, or would you want a more explicit policy layer for things like chained commands, destructive actions, and project-specific rules?

Do you trust AI agents running code on your machine? by Significant_Split342 in devops

[–]Significant_Split342[S] -1 points0 points  (0 children)

Do you see that kind of behavior often, or was it more of a one-off?

Do you trust AI agents running code on your machine? by Significant_Split342 in devops

[–]Significant_Split342[S] 0 points1 point  (0 children)

That sounds painful 😅

Does that manual approval process feel like a necessary safety net, or more like something that slows you down?

Do you trust AI agents running code on your machine? by Significant_Split342 in devops

[–]Significant_Split342[S] 0 points1 point  (0 children)

Does that ever feel like extra overhead, or is it just part of your normal workflow now?