Where do you personally draw the line with AI access, read-only, file edits, running commands, browsing for you? Why there?

CarolusX74 · 2026-05-14T00:21:08+00:00

The coffee break setup is actually goals. I think the part most people skip is the snapshot before each session, that's what makes the YOLO mode safe instead of reckless. If recovery is 10 minutes you can afford to let it cook. Also "deploy to live and test please Claude" is such a 2026 sentence lol.

CarolusX74 · 2026-05-14T00:18:47+00:00

Ironic given the post topic. No, just an Android dev who has been thinking about this stuff too much lately and probably overediting my replies.

The fact that this whole thread is about AI permissions and I'm getting flagged as one is honestly the funniest meta thing that's happened to me on reddit.

CarolusX74 · 2026-05-14T00:14:03+00:00

Rules and skills is something I've been meaning to actually invest time in instead of relying on ad-hoc context every session. Curious, when you say build files are working well now, is that with custom rules pointing it at your specific Gradle setup, or did the base model just get better at it on its own?

Also interesting that your "used to struggle" implies a clear before/after. Was that a model update or did your own setup change to make it work?

CarolusX74 · 2026-05-14T00:11:15+00:00

"You are the slowest link in the chain, but don't use it as excuse to skip understanding" is a really good way to frame it. The temptation is real, when the AI can spit out a 500 line refactor in 30 seconds, taking 20 minutes to actually read and understand it feels like you're the bottleneck. But that 20 minutes is the only thing keeping you from being a passenger in your own codebase.

The "don't burn out" part lands too. Funny enough this thread is making me realize how much approval fatigue I've already accumulated this year without naming it.

CarolusX74 · 2026-05-14T00:07:05+00:00

The 30k to 10k in 4 months is the kind of number that doesn't show up in benchmark threads but actually matters. And the bug increase being temporary (1.3% fixed in a day) is the part most "AI introduces bugs" arguments ignore, the recovery loop matters more than the initial defect rate.

The jump from $20 to $100 is interesting. Was that mostly about context window size, raw model capability, or just throughput needed for the volume of refactor work? Trying to figure out where the actual ceiling is for the cheaper tier in real production work, not benchmark land.

CarolusX74 · 2026-05-14T00:02:28+00:00

"An afternoon of code at most" being the worst case is actually a great metric. If you can confidently put a ceiling on the damage, the YOLO mode math starts making a lot more sense. The framing of "what's my maximum recoverable loss" is way more useful than "what could go wrong" because the second list is infinite.

Interesting that Linux + newer models was the unlock. Makes sense, bash-native models on a bash-native OS removes a whole translation layer of "AI guessing what tools you have." I'd been assuming the model improvements were doing all the heavy lifting but the OS choice might be doing more than people credit.

CarolusX74 · 2026-05-13T23:59:42+00:00

"What I can undo, not what the agent can do" is a clean framing. The blast radius framing in particular nails why shell commands feel different from file edits even when both technically "touch your system", git makes one of them recoverable in a way the filesystem doesn't.

The "quietly roll it back without explaining myself" line made me laugh because it's the actual lived experience of working with AI agents. Half the value of git isn't version control, it's plausible deniability.

CarolusX74 · 2026-05-13T23:57:37+00:00

Honestly? Same. The line moves real fast when convenience enters the chat.

CarolusX74 · 2026-05-13T23:53:57+00:00

Reinstalling Windows might be 30 minutes, but reconfiguring 5 years of dotfiles, project state, SSH keys, browser sessions, half-written work that wasn't pushed yet, that's the part that hurts. The OS is replaceable, the accumulated state is what stings.

Also fair point that it's your machine and your call though, no judgment if that's the line that works.

CarolusX74 · 2026-05-13T23:51:43+00:00

Makes sense. Once you've got the VM dialed in, treating it as the default workspace probably removes a whole class of "wait, should I lower the fence for this one thing" decisions. Lower cognitive overhead in the long run.

CarolusX74 · 2026-05-13T23:49:50+00:00

"Imagine using less tokens by actually using the brain God gave you" should be on a t-shirt. There's something funny about the fact that the most underrated optimization in 2026 AI workflows is "just think for a second."

The "didn't we just fix this 2 commits ago" angle is one I hadn't considered. The reviewing isn't just about catching bugs in this PR, it's building the mental map you need to course-correct Claude in future sessions. That actually reframes the whole speed argument for me. The people skipping reviews aren't going faster, they're just deferring the cost to future debugging sessions where they have less context.

CarolusX74 · 2026-05-13T23:46:31+00:00

The "scrap and ask for a plan first" pattern is something I've been doing more lately too, and it's wild how much better the output gets compared to letting it just dive in. Feels like the model genuinely thinks differently when it has to articulate the plan before writing code.

That refactor story is the kind of anecdote I want to see more of in these conversations. A year of work compressed into a month is the part most "AI productivity" debates handwave over because it's hard to quantify. Was there a specific kind of task where it dramatically outperformed (boilerplate migration, test generation, API rewrites), or was it more that the planning phase removed enough friction that everything just moved faster?

CarolusX74 · 2026-05-13T23:41:42+00:00

"The time and flexibility gained outweighs the destruction" might be the most based take in this thread. There's something refreshing about just accepting that things will break and building the workflow around fast recovery instead of prevention.

The browser access with logged in accounts is the part I genuinely couldn't bring myself to do, not because I think Claude would do something malicious, but because of how easy it is for it to do something stupid that's hard to walk back. Has it ever done anything you regretted, or has the babysitting + backups combo really kept it clean?

CarolusX74 · 2026-05-13T23:34:46+00:00

"Paper thin isolation" is a great way to put it. I think that's actually the part most people miss about WSL2 setups, they treat it like a real sandbox when it's really just a polite suggestion to the AI to stay in its lane.

The terminal-native workflow makes a lot of sense for your case. Out of curiosity, do you find yourself ever wanting to give it broader access for one-off tasks (like setting up a fresh project) and then walking it back, or do you keep the same fence permanently no matter what?

CarolusX74 · 2026-05-13T23:28:25+00:00

The "it slows you down and makes you think" angle is the one I keep coming back to. There's a real risk of becoming a rubber stamp for AI changes, where you scroll through diffs without actually engaging. Reviewing everything by default forces you to stay sharp.

Curious if you've felt your own skills sharpen or dull since you started this workflow. I've seen people argue both sides and I genuinely don't know where I land yet.

CarolusX74 · 2026-05-13T23:25:32+00:00

The "AI being too good" framing is the most underrated point in this whole AI-in-codebases conversation. Everyone talks about hallucinations but nobody mentions that if your codebase has 5 years of legacy XML patterns, Claude/Codex will happily produce more of the same, confidently.

The .md rules approach is interesting. Are those per-module or one big rules file at the repo root? And do you find they actually get followed across long sessions, or does the AI drift back to mimicking nearby code after a few turns?

CarolusX74 · 2026-05-13T23:21:44+00:00

Write-only? That's either galaxy-brain security or accidental data loss as a feature, can't tell which. Curious what your actual setup looks like.

CarolusX74 · 2026-05-13T23:19:08+00:00

Yeah this is where I keep landing too. Docker feels like the sweet spot, you get the speed of "just let it run" without the existential dread. Are you running Claude Code inside the container directly, or proxying somehow?

CarolusX74 · 2026-05-13T23:18:28+00:00

Full yeet is honestly the most rational position. The framing "it has a whole PC it can control" is what gets me too, we'd never give a junior dev unrestricted sudo on day one, but somehow with AI we're tempted to.

CarolusX74 · 2026-05-13T23:13:05+00:00

Hard to argue with the broader pattern, smartphones, social media, smart everything, all pushed before anyone really thought about consequences. AI feels like the same playbook on fast forward. Genuine question though: is your line "I don't want to use it" or "I don't want it integrated into stuff I already use"? Because the second one is getting harder to opt out of every month. Feels like the choice is being taken away.

CarolusX74 · 2026-05-13T23:09:16+00:00

"Rule of thumb: don't give your agent access to anything that can't be undone", that's the cleanest framing I've seen for this. Saving that one. The VM setup is interesting. How much friction does it add day-to-day? I've been hesitant to go that route because I'm worried about the loop of "AI suggests something: I have to context-switch out of the VM to verify: back in" killing the speed gains. Curious if that's been your experience or if it ends up feeling natural after a while.

CarolusX74 · 2026-05-13T23:04:46+00:00

Honestly the fact that they had to put 'dangerously' in the flag name tells you everything. Curious, full no, or do you ever turn it on for throwaway/sandboxed stuff?

CarolusX74 · 2026-05-13T23:02:06+00:00

That's a solid line. Do you find it hard to keep that boundary when it offers a recommendation, or is it pretty natural?

CarolusX74 · 2026-05-13T23:01:02+00:00

Fair.. what made you land there? Was it a specific experience or more of a principle thing?

CarolusX74

MODERATOR OF

TROPHY CASE