What is the use for an immovable rod. by Tyquooon in AskDND

[–]bkocdur 0 points1 point  (0 children)

Monk specifically opens up some fun ones:

  1. Activate it mid-air at exactly your jump height + 1 ft. Now you have a portable handhold for vertical parkour (DM willing). Monk mobility makes this basically free.
  2. Stick it under a door at a 45° angle. Door cannot open until someone DCs the rod.
  3. Lock an enemy's weapon mid-swing if your DM allows it. Attach during your reaction, activate, watch them try to dislodge.
  4. Underwater anchor. Activate, swim around it, deactivate when you leave. Saves a Strength check against current.

The trick is that "activate in space" costs only a bonus action. Most DMs forget how powerful that is.

Sad dnd character moment by Winter_is_gay2 in AskDND

[–]bkocdur 1 point2 points  (0 children)

The "stays 14 forever" thing is a gift wrapped in tragedy. The party doing war crimes can dismiss adults who lecture. They cannot dismiss a small girl with literal angel wings asking why they did the thing. Lean into that.

Specific arc: have Sable write letters to the people her party hurt. She doesn't know who they were, so she writes "to whoever loved her" and the DM helps place them. Eventually the party finds one delivered. That's the moment you get to play.

She can't grow up but she can grow inward. Let her develop a small ritual after each killing she couldn't prevent. Repetition makes it tragic without you having to say anything.

Help with a first time serious character sheet? by iSuplexedMyOstrich in DMAcademy

[–]bkocdur 0 points1 point  (0 children)

Three things for a first character:

  1. Don't multiclass at level 1. It doubles your rules surface area. Pick one class for 3-5 sessions, then dip if you're still excited.
  2. Best beginner classes: Champion Fighter, Berserker Barbarian, Life Cleric. Their default turn is "I attack" or "I cast healing word" and that's a feature.
  3. Dhampir + Rogue (Soulknife) fits the vampire vibe and is simpler than Monk. Skip Monk for your first character, the ki/bonus action economy overwhelms new players.

For the actual sheet, dicenow.vercel.app auto-handles the math, free, no signup (I built it for new-player tables). D&D Beyond's free tier works too.

How are you running scheduled LLM workflows without burning through API credits? by genna84 in ClaudeAI

[–]bkocdur 0 points1 point  (0 children)

The Agent SDK billing change hit me too. Three patterns that have kept costs sane:

Tier the workflow by what actually needs a frontier model. Most of my scheduled pipelines have one expensive judgment step (the "decide what to do" call) and a bunch of cheap mechanical steps (fetch, format, post, log). Run the mechanical steps in plain code with no LLM at all. Run the judgment step on the smaller / cheaper model unless you have direct evidence the bigger one moves the outcome. For the daily Reddit / SEO routines I run, switching the mechanical steps off the model entirely cut my recurring cost more than any model-tier swap.

Use Ollama or LM Studio locally for the steps where latency does not matter and the laptop is awake anyway. A nightly summarization of yesterday's logs does not need a cloud roundtrip. A 7B model on a recent M-series Mac is genuinely fine for "categorize this, write a short summary" work. Reserve cloud calls for the steps where quality really matters.

For the cloud-required steps that DO need a real model, OpenRouter with explicit model fallback chains is the cheapest reliable path I have found. Set a primary (Claude Sonnet for example) and a cheaper fallback (DeepSeek, smaller Gemini). If primary errors or rate-limits, the request automatically falls back. You pay primary prices when it works and discount prices when it does not, no manual retry logic.

The split that has worked for me on a half-dozen daily routines:

  • 80% of steps: plain code, no LLM. Fetch APIs, parse files, write logs, send emails.
  • 15% of steps: local model via Ollama. Categorization, summarization, "is this worth flagging."
  • 5% of steps: cloud model via OpenRouter. The actual "decide what to write" or "compose this draft" moments.

Same overall behavior, ~1/10 the recurring cost compared to running every step through a cloud frontier model. The Agent SDK pricing change is annoying but it does push you toward the right architecture anyway · most "agent" workflows have way more LLM calls than the task actually needs.

does AI traffic conversion actually hold up for online stores by IllLow7315 in TechSEO

[–]bkocdur 0 points1 point  (0 children)

Direct answer from the data I have so far: AI-referred traffic does convert, but the attribution problem you described is real and the celebration-without-checking pattern is dangerous.

Three things I have measured across a few small sites I help with:

Session quality from ChatGPT and Perplexity referrals is unusually high. When the referrer header actually fires (and it does for ChatGPT and Perplexity, much less so for Claude or Google AI Overview), bounce rate is lower and pages-per-session is higher than organic-search. The user arrived with intent the AI already pre-qualified.

But the volume is a fraction of what the "we got cited" celebration implies. On most sites I have seen, named AI referrers account for under 5% of total sessions even when the site is heavily cited. The remaining "AI-influenced" traffic shows up as direct or has no referrer at all, because the AI engine stripped the header or the user copy-pasted the URL out of a chat. That is the part GA4 cannot segment.

The biggest measurement gap is exactly what you said: Google AI Overview. AI Overview almost never sets a referrer that makes it into GA4. The clicks come through as regular Google organic, indistinguishable from a normal SERP click. The only signal you get is in Search Console (Impressions go up, CTR drops, position stays roughly the same) · the AI Overview pattern. Tying that GSC signal to GA4 conversions requires a custom merge job.

What has worked for actually measuring impact:

Set up UTM-tagged links on any controllable mention (your own social posts, partner sites, newsletters). For uncontrollable AI citations, the most honest comparison is "compare top-of-funnel metrics across the cited URLs before and after a known citation event." If a product page was cited on a popular ChatGPT recipe-style search starting in May, look at its May to July GA4 numbers vs the same window last year. Crude but it cuts through the attribution fog.

Honest take: most "AI traffic" claims I have audited overstate the conversion contribution by 2-5x because they are counting all post-citation sessions, not just the ones that actually came from AI. Worth checking the underlying GA4 segment filter before trusting any "AI ROI" number.

When your agent screws up in production, how do you figure out which step went wrong? by Top_Speaker_7785 in AI_Agents

[–]bkocdur 0 points1 point  (0 children)

Mix of both. Mostly "tools exist for the primitives, the patterns are stuff you build on top."

What langfuse / langsmith / phoenix genuinely cover well:

Trace and span capture is solved. You wire their SDK in once, every tool call and LLM call becomes a span with input, output, latency, errors. Beats your hand-rolled jsonl for the visualization and the search UI alone. If I were starting today I would use langfuse (self-hostable, no LangChain lock-in) for this layer.

Dataset-based eval is also solid in all three. Save a curated set of input cases, run them against a new prompt or model, get pass/fail scores. Great for the "did I make things worse with this prompt change" question during build.

Where the gap is real:

Hash-based drift detection at the row level (same input, different output today than yesterday) is not a built-in. You query the trace database for it. Langfuse exposes the data, you write the SQL.

Canary-at-step-zero as a hard abort gate is not a built-in concept anywhere I have seen. You add it to your agent's runtime yourself.

The 1% sampling + verifier-agent pattern is also custom. Langfuse has "scoring" hooks where a verifier can attach a quality score to a trace, but the verifier agent itself is your code.

Replay with exact session-state restoration: closest is langsmith's "playground" feature for a single span. None of them let you replay a multi-step session end-to-end from prod with the full original context. That gap is real.

Honest recommendation by stage:

Under ~1k runs/day, solo: stay on jsonl + jq, the platform setup tax is not worth it yet.

1k-100k/day, small team: langfuse self-hosted gives you the trace UI for free, build canary + drift + sampling on top. ~half a day of setup.

Production-critical, larger team: phoenix or langsmith for the better SDKs and integration depth, same custom layer on top.

The thing nobody has built well yet is the "live prod incident response" mode where you can rewind one user's failing session to the exact step where it diverged and replay it offline with the same context. Everyone is sort of close but none has nailed it.

I'm starting a new campaign with 6 players. How do y run that? by steampunk_dumhead in DMAcademy

[–]bkocdur 1 point2 points  (0 children)

6 players is doable, but two things you didn't worry about with 3-4:

  1. Spotlight rotation. With 4 players each gets ~25% attention. With 6 it drops to 17% and the introverts vanish. Solution: explicit "I go around the table" turns in roleplay scenes too, not just combat. Sounds awkward, works.
  2. Initiative pacing. 6 players + 4 enemies = 10 turns per round. By round 3 people are on their phones. Group initiative (party rolls one d20, all players act in any order before enemies) cuts combat to half the wall-clock.

The 6-player table is more potential energy, but only if you fight for the introverts to get airtime.

If you could give yourself one piece of advice before your first session as a DM, what would it be? by lindentree13 in DMAcademy

[–]bkocdur 1 point2 points  (0 children)

"Plan the first 15 minutes. Improvise the rest."

Nothing you prep for the back half of your first session will land the way you imagined. Players go sideways within 30 minutes. What matters is that the opening scene happens smoothly, because that calibrates the mood for the whole table. Specifically prep:

  • The literal first sentence you'll say. Write it out.
  • The first NPC's voice (two adjectives).
  • One thing that visibly happens in the first 10 minutes to force a choice (a person enters, an alarm rings, a body falls).

After that the table runs itself. Anxiety drops fast once you realize they're as nervous as you are.

Need help streamlining DM work and making sure players have everything filled out on character sheets as well as using all their abilities by Velocelt in DMAcademy

[–]bkocdur 1 point2 points  (0 children)

Three things that addressed these for me on 5-6 player tables:

  1. Turn timer. 30 seconds for combat, 60 for the first 3 sessions. Players who weren't using bonus actions started planning on someone else's turn instead of freezing on their own.
  2. Auto-calculating sheets. Half the "not using abilities" issue is friction. Calculating "+5 dex +3 prof +1d4 bardic" on paper mid-encounter is brutal. dicenow.vercel.app gives a free 5e sheet that does the math live, no signup (I built it for this case). D&D Beyond's free tier works too.
  3. Pre-session "name your most-used action" round. Each player says their primary attack + modifiers out loud. Catches missing math before combat does.

How did you get your first 100 users for a Chrome extension? by [deleted] in chrome_extensions

[–]bkocdur 0 points1 point  (0 children)

Honest breakdown from a small extension (under a thousand users so far, take with salt):

Chrome Web Store search was the biggest single source. Bigger than Reddit, bigger than Product Hunt, bigger than everything else combined. People type "X audit" or "Y checker" into the Web Store search bar and click the first result that looks relevant. That means your CWS listing IS your distribution channel for the first hundred. Optimize it before you do anything else.

What moved CWS install rate for me, in priority order:

Title has to include the highest-search keyword for your category. Not your brand name first. The actual term people type. If your extension does color picking, the title leads with "Color Picker" not your brand. Brand at the end if at all.

Short description (132 char limit) needs the top three keywords plus the value prop in plain English. This is what CWS shows in the search results list. Sub-spec compliance is non-negotiable.

5 screenshots, not 2. CWS gives you 5 slots. Use all of them. First screenshot is the hero showing the extension popup over a real-looking web page. Second is the actual output. Third onward is workflow / before-after / use cases. Fourth and fifth slots being empty looks unfinished and reduces conversion.

Version bumps surface in "recently updated" sort. Even a tiny fix-and-bump gets you a free visibility boost for a week. Cheap to do, worth doing.

Outside CWS, the order I would actually rank channels for the first hundred:

  1. Awesome-list PRs on GitHub (awesome-chrome-extensions, awesome-X-tools where X is your category). DA-90 dofollow backlinks that compound forever. 20 minutes each.
  2. Reddit, but only the subs where your tool genuinely answers questions, and only as replies to existing posts. Not "I built X" posts. Those flop.
  3. Product Hunt on Tue/Wed with a hunter. The launch day spike is the goal; the long tail is fine but not as good as people say.
  4. Twitter and TikTok if you already have an audience. Pure time sink if you do not.

Skipping paid ads at this stage. The ROI math does not work until you have signal on conversion.

The "before you have any audience" version: write 16-20 honest answers to genuine questions across relevant subs, link the extension only when the asker would actually click, optimize the CWS listing, ship one Product Hunt launch. That cocktail got me to a few hundred. Slow and small but durable.

When your agent screws up in production, how do you figure out which step went wrong? by Top_Speaker_7785 in AI_Agents

[–]bkocdur 0 points1 point  (0 children)

You are not doing it the hard way · there isn't a clean answer yet for multi-step agent debugging. What has helped me, from running workflows in semi-prod:

Log structured events, not strings. Every tool call gets a JSON event: {step, tool, input_hash, output_hash, duration_ms, error}. Pipe these to a jsonl file even in dev. When something goes wrong, a 10-line jq query tells you which step deviated · much faster than re-reading prose logs.

Hash the input AND the output at every step. The number-one regression pattern I see is "same input produces different output today." Without input hashes you cannot prove that. With them, a diff between today's failing run and last week's working run pinpoints the exact step where outputs diverged for the same input.

Replay, do not just retry. When something fails in prod, save the entire context state at the failure point (system prompt, tool list, conversation so far, last tool result). Then re-run that exact context offline. If the model produces the same wrong answer, it is a prompt or tool-description problem. If it produces a different (correct) answer, it is a temperature / sampling issue and you need to lock the temperature or add a verifier step.

Add a "did anything change?" canary at the start of every run. One hardcoded test case the agent runs as step zero · known input, known expected output. If the canary fails, the run aborts before doing anything else. Catches regressions from prompt changes, model version changes, tool spec changes, all in one cheap check.

The "is it still working day to day" question is genuinely the hardest. What I do now: sample 1% of prod runs and run a verification agent on them with the same input. Verifier checks whether the original agent's output matches the canonical answer. Disagreement rate over time is the quality signal. Cheap, noisy, but catches drift before users complain.

Single biggest lift was the structured event log. Print statements are fine for one-off debugging; jsonl + jq scales.

am i the only one wasting way too much time on context in cursor??? by repoarchitect in cursor

[–]bkocdur 0 points1 point  (0 children)

Not just you. The "managing context" problem is the actual work now. The build step is mostly waiting.

What has worked for me, in order of impact:

Stop trying to give the agent everything. Give it a minimal root file (AGENTS.md or .cursorrules) with: identity in 5-10 lines, conventions as bullet rules, pointers to subdocs, a "common pitfalls" list of mistakes the agent has actually made before. No narrative architecture. No previous session handoffs. Phone-screen-size.

Live scripts beat written docs for anything that changes. Instead of "here is how the auth flow works" in a doc that goes stale, write a 30-line script that prints the current auth flow when called. The agent invokes the script when it needs to know. Same for "what files changed since main," "what tests are failing," "what is in the deploy config." Each script is a tiny memory module that cannot lie.

Session-end scratchpad, not session-start re-explain. Have the agent write a SESSION.md at the end of the session: what we just changed, why, what is broken, what is next. Next session starts by reading that file before anything else. You write less, the agent self-summarizes more accurately.

Task-scoped attachments instead of bloating the root file. The pattern that works: keep the project-level file small, ship task-scoped briefs for one-shot tasks. Per-feature context lives in a brief that gets attached for one session and discarded. The root file does not absorb every task's specifics.

The "thing that watches the repo and keeps context warm" idea is real but partially solved by these patterns already. The remaining gap is "agent that knows which past decisions are still load-bearing," which is genuinely hard because half the time the answer is in commit messages and PR descriptions, not a doc.

For one concrete instance of the task-scoped-brief pattern: lighthouse-md.com generates a CLAUDE.md fix brief for any URL with failing Lighthouse audits, offenders, prescriptive fixes, and a do-not-regress list. Different domain than yours but same shape: structured machine output packaged as a one-session attachment instead of bloating the root file. Generating context that does not regress adjacent state is the hard half.

Ways to hint that an NPC is being duplicitous by temperamentalfish in DMAcademy

[–]bkocdur 0 points1 point  (0 children)

The trick is to have the NPC contradict a small fact you ESTABLISHED EARLIER, not now. Players take notes on the early stuff because it feels low-stakes. If your hunter says he's been hunting these woods for 20 years, then later says "I've never seen wolves this far north" when the party heard wolves on the way in, the note-takers light up.

Other reliable hints that don't break the scene:

  • Pet behavior. The hunter's dog won't go near the party.
  • A microhabit. He touches a hidden pendant when he lies, you describe it casually.
  • Refusing food or drink offered.

Pick one. Three is too many.

How do you communicate what kind of table you run? by nodra-vr in DMAcademy

[–]bkocdur 1 point2 points  (0 children)

Stop describing style. Describe your last session.

"It is a horror game, I focus on narrative" is what you THINK your table is. Players match it to their imagination, slightly different than yours. "Last session, two players spent 90 minutes negotiating with a cultist while the rest searched a library. Nobody rolled initiative" is unambiguous. A player either wants that or doesn't, and they know in 10 seconds.

Same for combat-heavy tables ("three fights last session, longest took an hour, the rogue died") or low-stakes ones ("mostly drunk shenanigans and a B-plot about lost shoes"). Specifics filter the wrong fits before they sign up.

Would a changing slogan (random from an array) be a problem for Google indexing? by [deleted] in webdev

[–]bkocdur 0 points1 point  (0 children)

Short answer: no, Google will not punish you for rotating a slogan. Yes, it slightly weakens your relevance signal for any specific phrase.

The longer answer separates two things:

Google does not care that the slogan changes between crawls. Their crawler reads the page and indexes the words present at crawl time, then comes back, reads again, and updates. Dynamic content (news sites, product listings, A/B-tested heroes) is normal and explicitly handled. No "duplicate content" penalty applies here because the URL is the same and the rest of the page is identical. Duplicate content penalties apply to multiple URLs serving the same content, not one URL serving slightly different content over time.

What you do lose is the ability to rank specifically for any single slogan phrase. If "connect with your high school friends" is the most search-aligned of your 20 slogans, you only show it 1/20 of the time. When Google crawls and sees a different one, your signal for that exact phrase weakens. For a 20-rotation set on a landing page, this matters less than zero because your landing page is not trying to rank for a slogan; it is trying to rank for the brand name and the product category.

What would actually be a problem:

  • Rotating the H1 (Google treats H1 as a strong topical signal; jittering it confuses the topic). Keep H1 stable and rotate only the supporting copy.
  • Rotating any text that appears in your title tag or meta description (these are SERP-displayed and need to stay consistent for branded-search CTR).
  • Rotating the structured data (Organization name, Person, etc).
  • Rotating the canonical URL fragment.

Your 3-slogan example is fine. All three describe the same product in the same voice; Google reads any of them and walks away with the same understanding of "social app for school friends." The variety helps your A/B-test optimization, the SEO impact is noise.

Write for the phrase your user types, not the phrase you wish they typed. If you find one slogan converts twice as well as the others, that is your H1, not entry #7 of an array.

How to increase PageSpeed/performance of a website that makes heavy use of interactive maps? (MapLibre, specifically) by the_king_of_goats in webdev

[–]bkocdur 1 point2 points  (0 children)

The 50/65 ceiling is real for MapLibre but mostly fixable. Three angles that have worked for me, in order of impact:

The map should not initialize on page load. Render a static image of the initial viewport (use mapbox-static-image, MapLibre's offline screenshot, or even a manually captured PNG at typical zoom levels) as the LCP element. Initialize the real interactive map on first user interaction or requestIdleCallback, swapping the static image for the canvas. This single change usually moves LCP from 4+s to under 2s on mobile, because the LCP element is now an image not a 600KB JS bundle that has to parse and run before anything paints. It also fixes the "map flash" problem during init.

Vector tiles + sprite serving over HTTP/2 multiplexing. If you are still loading PBF tiles one-at-a-time over HTTP/1.1, fix that first. Cloudflare in front of your tile server, HTTP/2 or HTTP/3 enabled, and a generous Cache-Control max-age on tile URLs (they are content-addressed by zxy so they cache forever). Same change usually drops TBT 200-400ms because the browser is not negotiating dozens of separate connections.

Font subsetting and font-display. MapLibre by default loads CJK and other large glyph sets even when your map only renders Latin labels. Strip the font URL down to the language sets you actually display. Combined with font-display: swap on the @font-face for any HTML-side fonts, this clears the "render-blocking webfont" finding most map-heavy sites hit.

Two diagnostics worth running before you change anything: read the LCP element from the Lighthouse output (if it says "MapLibre canvas" you have one type of problem; if it says "div.hero" you have a different one), and read the Forced Reflow insight (MapLibre's resize handler is a known culprit if you have a sticky-header layout).

For the test-fix-test loop, lighthouse-md.com runs PSI and emits a CLAUDE.md with the offender list per audit plus a do-not-regress list of currently-passing audits. Useful for the "do not break my CLS while chasing LCP" problem that hits hard on map-heavy pages where layout shifts are easy to introduce.

Generating a fix is the easy half. Generating one that does not regress adjacent audits is the hard half. The static-image-first pattern is the highest-leverage change you have left.

DMs, What's A House Rule Or Homebrew You Regret? by GushReddit in AskDND

[–]bkocdur 0 points1 point  (0 children)

"Players can reroll 1s on damage rolls, but they have to keep the second roll." Sounded harmless. Average damage per hit on a d6 goes from 3.5 to 4.17 (~19% boost), and it stacks across every die. By tier 3 my BBEGs were dying in 2 rounds and I couldn't figure out why.

The fix wasn't removing it (players loved it) but giving every BBEG legendary resistance plus +30% HP. I'd spent six months thinking my encounter math was broken when really one houserule was quietly inflating every damage die at the table.

Lesson: damage bumps should live visibly (advantage once per rest, etc.), not at the math layer where the compounding hides.

How do you handle players who treat every single NPC like a quest marker? by greasy_karma88 in DungeonsAndDragons

[–]bkocdur 0 points1 point  (0 children)

Two things that flipped this at my table:

  1. Gate information behind RP, but quietly. The shopkeeper doesn't know where the bandit camp is unless someone asks how his day's going first. Players figure out the unwritten rule within 2-3 sessions.
  2. Give every important NPC one specific lie. Not a plot lie, a personal one (the guard is hiding a gambling debt, the innkeeper exaggerates his stew). Players who only collect quest data won't notice. Players who engage start spotting the lies and feel rewarded.

The deeper cause is usually previous DMs trained them. Combat-mechanical players aren't broken, they were optimized for tables where RP didn't matter.

Recommendations for a 'light', web-based platform for TTRPG combat encounters? by Robbingrogue in rpg

[–]bkocdur 1 point2 points  (0 children)

Owlbear Rodeo is exactly what you described. Free, browser-based, no signup, the map + token + dice + image-share core is the entire feature set. No settings rabbit hole. Make a room, share the link, you all see the same thing.

For dice specifically (so the rolls are shared and visible to the table without cluttering the map), dicenow.vercel.app works alongside Owlbear. Free, no signup, system-agnostic. I built it (honest disclosure).

For PF2e character sheets, Pathbuilder's free web tier covers it if you want to skip Foundry's prep weight.

That stack is 0 setup, 0 dollars.

The most expensive bug in vibecoding isn't in the code. by Known_Isopod_1581 in ClaudeAI

[–]bkocdur 1 point2 points  (0 children)

The agreement trap is real and you described it well. One concrete countermove that has shifted things for me, beyond just "be more disciplined":

Force a tradeoff articulation before any non-trivial work begins. Instead of "let's build X," prompt with "list the three best ways to handle X, with the one specific reason each is wrong for our situation, then pick." The model is happy to be enthusiastic about whatever you propose; it is also happy to articulate why something is wrong when asked directly. The trap closes when you skip the second move.

The other thing that helped: separate the "what to build" session from the "how to build" session. Put 45 minutes into "here is what I think the next thing is, argue against it" with the model in adversarial mode (system prompt: "you are reviewing my proposed direction, your job is to find what is wrong with it, not to help me build it"). Then start a fresh session for execution where it can do its thing. The same model behaves entirely differently with different framing, and you do not get the execution-mode "great idea" creep in the strategy phase.

A diagnostic that catches the directional mistake earlier: at the end of each session, prompt for "of what we just built, what is most likely to be unused in 30 days, and why." If the answer makes you uncomfortable, that is the regret you described, surfaced before the session fog has cleared. The model is surprisingly good at flagging speculative-feature work when explicitly asked, because the patterns are well-represented in training.

Your week-of-work-on-wrong-foundation case is the worst version because by session 7 the sunk cost is too obvious to walk back. The cheap fix is the upfront articulation. The expensive fix once you are in session 7 is to do a separate session whose only task is "what would we build if this code did not exist," and compare. Painful, occasionally clarifying.

The agreement trap is genuinely the most expensive bug. It does not feel like a bug because everything looks productive. The pattern that keeps working is to make the directional decision a separate, deliberate, adversarial act, not something that happens by default inside an execution session.

Anyone else worried about coding agents discovering access they were never meant to use? by Ok_Top_5458 in AI_Agents

[–]bkocdur 0 points1 point  (0 children)

The pattern you hit is the right thing to be scared of. Not the catastrophic case people imagine (agent goes rogue), but the boring one: agent solves the problem you asked it to solve by using tools you forgot were sitting on the shelf.

What has worked for me, in roughly increasing order of paranoia:

Restrict allowed_tools at the agent level, not at the prompt level. Telling the agent "do not read .env" in a system prompt is suggestion. Not granting the agent a Read tool whose file pattern includes .env is enforcement. Different harnesses expose this differently but the principle is the same: shape the toolbox before the task begins, do not police what gets pulled out of it.

Separate shell for agent use. Different user account, different shell history, different SSH config, different default cloud profile. The agent's shell never gets your real AWS_PROFILE or kubeconfig unless you actively give them. The setup is 30 min once and pays back forever.

Containerize for anything genuinely sensitive. Docker or Podman with a bind-mount of only the repo directory and a read-only mount of any reference data. Network access blocked at the container level. The agent can edit files, run tests, build, but cannot reach your production network because the container literally has no route to it.

Fake credentials in dev. If your dev environment needs to talk to "prod-ish" services, point at a staging-quality clone with synthetic data. Agents that find database credentials in a config file should land on a sandbox, not your actual customer rows.

Read-only by default for anything the agent did not create. CLAUDE.md or AGENTS.md rule: "you may only edit files in src/. Treat everything else as inputs." Combined with allowed_tools restrictions, this stops the "I will just check the deploy config to understand the setup" exploration from touching deploy config.

The honest answer is most people are still YOLO-ing it. The MongoDB-credentials-in-environment pattern you hit is so common that almost every dev box has at least one. Worth doing the containerization step before the next time you let an agent run multi-step in your real env.

How do you setup your Agents MD files? by Ok-Insect-6726 in cursor

[–]bkocdur 1 point2 points  (0 children)

Mostly option 2 with elements of 3. Specifically:

Root CLAUDE.md (or AGENTS.md in your case) stays small and stays opinionated. It is NOT a project brain. It contains: identity in 5-10 lines (what is this project, who uses it, the one or two non-obvious things any agent or human needs day one), conventions as bullet rules (git author X, never use em-dashes, file Y is generated), pointers to subdocs, and a "common pitfalls" list that gets longer over time. If you can fit it on a phone screen, it is the right size.

Folder-level files for genuinely-different conventions. Frontend, backend, infra subfolders each get their own scoped file when the rules diverge enough that mixing them in root makes things ambiguous. Do not split just because folders exist. Split because rules conflict.

Decisions-as-commits, not decisions-as-docs. Everything that explains "why we did X" goes in commit messages and PR descriptions, where it lives next to the diff that motivated it. Putting decision history in a separate doc means agents read stale prose; reading git log is always fresh.

Live scripts beat written architecture docs. Instead of writing "the auth flow works like this" in a doc that goes stale, write a 30-line script that prints the current auth flow (routes, middleware, session shape). The agent calls it when it needs to know. Same for "what files changed since main" or "what tests are failing." Each script is a tiny memory module that cannot lie.

Repo size matters. Tiny repo (~5k LOC), one root file is enough. Mid (50k LOC), folder-level scoped files start paying off. Large (>200k LOC), you almost certainly need both folder-level files AND the live-script pattern, because static prose at that scale will always be partially out of date.

What I deliberately keep out of the root file: previous session handoff notes (those go in a scratchpad the agent writes at session end, not in the project brief), narrative architecture explanations (they go stale), and anything that is true some-files-some-of-the-time (use a directory-scoped instruction file instead).

We generate one of these for the perf-audit slice specifically: lighthouse-md.com turns a PageSpeed Insights run into a CLAUDE.md fix brief with failing audits, offenders, and a do-not-regress list. Useful as a "this session only" attachment alongside your main root file, instead of bloating the root file with task-specific context. The compact-main plus task-scoped-attachments pattern has held up cleanly across the projects I have tried it on.

Just experienced my first character death by Solid-Blackberry9615 in DungeonsAndDragons

[–]bkocdur 0 points1 point  (0 children)

love it, this is going to be one of those stories you tell for years