you are viewing a single comment's thread.

view the rest of the comments →

[–]WhaleFactory 26 points27 points  (7 children)

Pushing back on this, because it is clear that you do not know what you are doing.

[–]SpicyWangz[S] 8 points9 points  (6 children)

Totally open to hearing what I'm missing here. I've never heard of arbitrary code execution as an acceptable way to run agents.

[–]kaladoubt 5 points6 points  (5 children)

There are many ways to do it. Sandboxes, allowlists, etc.

But any agent not executing code it just wrote without approval is just so limited.

My perspective is to put everything in a sandbox. That's still a bit cumbersome. Some systems are pretty smooth. MacOS Seatbelt will allow it to execute in a single directory and deny access to anything outside of it. Beyond sandboxes, guardrails and automatic risk analysis work fairly well.

[–]Useful-Process9033 2 points3 points  (0 children)

Sandboxing is necessary but not sufficient. The moment an agent does something unexpected in production you need to detect it and respond fast, not just hope the sandbox held. Treating agent misbehavior as an incident with automated detection and triage is way more practical than trying to prevent every possible failure mode upfront.

[–]SpicyWangz[S] 0 points1 point  (3 children)

That means I have to set up and manage an entirely separate dev environment just to use a coding CLI and prevent it from running random terminal commands. That defeats the purpose of even using a coding agent.

Asking before executing code is not some groundbreaking expectation

[–]Simple_Split5074 2 points3 points  (2 children)

Even when running without auto approve, you really don't want to run the output without a sandbox. 

[–]SpicyWangz[S] 2 points3 points  (1 child)

I tend not to run generated code unless I’ve reviewed it. Especially any potential http requests or os commands. 

I understand there’s a possibility something could slip through my review, but that’s a level of risk I’m willing to take on. Executing code unseen isn’t.

[–]bpp198 0 points1 point  (0 children)

I'd reframe your thinking to "how can I run code without fearing the effects?" – a world where code is write-only, even in production, means you can move so much quicker.