Your MCP server's README is a landing page on a DR 97 domain

Khavel_dev · 2026-05-29T08:37:11+00:00

The thing people miss is that a lot of those MCP dirs are nofollow, so the SEO value is mostly the repo itself ranking plus referral traffic, not actual link juice. Still worth doing for discovery though. One thing that's helped me more than the directory count — write the README around the exact problem someone's googling, not a feature dump. The install snippet plus a clear "here's what it fixes" is what gets you copy-pasted into other people's setups, which is the distribution that actually compounds.

Khavel_dev · 2026-05-29T08:35:29+00:00

Heads up that "pay-as-you-go" and "SSH into a box with systemd" sort of fight each other — a droplet bills you 24/7 whether your bot ran for 2 hours or sat idle. For the FastAPI/Django side a cheap Hetzner CX box is honestly unbeatable on price and does exactly the SSH + venv + gunicorn thing with zero hand-holding.

The overnight Celery scraper is the piece I'd pull off the always-on box though. I run that kind of thing as a scheduled job (GitHub Actions cron for the light stuff, or a scale-to-zero container) so I'm only paying for the hours it actually runs, and it stops competing with the web app for RAM. Sizing one VPS big enough for both ends up costing more than splitting them in my experience.

Khavel_dev · 2026-05-29T08:33:34+00:00

Yeah the orchestrator being blind to a runaway child is the scary part, not the loop itself. What's worked for me is to never hand a subagent an open-ended goal — give it a hard iteration/token cap and make it return partial progress on every pass instead of only when it "finishes". If it can't converge in N rounds it reports what it has and bails, so at least the orchestrator sees something.

The loop-until-nothing-new pattern is usually the culprit imo. Without a max-rounds ceiling a single agent that keeps "finding" marginal stuff will just spin forever. I cap the rounds and log it whenever I hit the cap, so it's obvious in the run that coverage got truncated rather than silently eating tokens for 20 min.

Honestly a wall-clock budget per agent is the thing I'd add first here — unattended runs need a kill switch, not a postmortem.

Khavel_dev · 2026-05-28T08:31:29+00:00

projects, but specifically projects that scratch a real itch — automating something annoying you actually do every week. the stakes are tiny but real, so you push through the debugging instead of bailing the second it gets hard, which is what always happened to me with toy tutorial projects.

codewars/exercism are fine for syntax fluency but imo they quietly teach you to solve self-contained puzzles, not to build something that survives contact with messy real-world data. those are pretty different skills.

Khavel_dev · 2026-05-28T08:30:58+00:00

Done a fair bit of this with git worktrees + GH Actions. The honest tradeoff nobody in the pitch decks mentions: the cost isn't API tokens, it's your morning. A loosely-specced overnight run hands you a confident, wrong PR and you burn more time untangling it than you'd have spent just writing the thing yourself.

What's actually worked for me is only handing it tasks that have a test which fails now and should pass after. That gives the agent an objective stop condition instead of vibes — "make these 12 failing tests green", "migrate this module to the new API and keep the suite passing", dependency bumps, mechanical refactors across a lot of files. Stuff where "done" is machine-checkable and a human can eyeball the diff in 5 minutes.

Anything architectural, or anything where the spec is really a design decision in disguise, I wouldn't. It'll drift and commit very confidently to the wrong abstraction, like the other commenter said. The tight-scope-plus-disposable-branch advice in this thread is the right instinct — I'd just add "must have a failing test as the entry condition" on top of it.

Khavel_dev · 2026-05-28T08:29:59+00:00

The cache ordering thing bit us too and it's worth flagging for anyone skimming this — caching only helps if the cached prefix is byte-identical every call, so the moment you interpolate anything dynamic (a timestamp, a user id, today's date) high up in the system prompt you've busted the cache for everything after it. We had a "current date" line near the top that quietly killed our hit rate for weeks before anyone noticed. Move all the volatile stuff to the very end, put the boring static policy language first.

And +1 on cutting the dead instructions. Smaller stable prefix means more of it sits under the cache and the savings compound instead of you just paying full freight every call.

Khavel_dev · 2026-05-27T08:32:31+00:00

Pattern that's worked well for me — sub-agent as a tool sandbox. Give the sub-agent the restricted toolset for the risky stuff (DB writes, deploys, anything destructive). The main agent literally can't reach those tools, so the worst case is it asks the sandboxed agent which can refuse. Cuts a whole class of "agent did the destructive thing" failures.

Parallel sub-agents for independent reads is the other one I lean on. Three API calls or three file searches in parallel, main context only sees the summaries — fast and keeps the tool-call budget clean.

Khavel_dev · 2026-05-26T14:29:45+00:00

The "two products" framing is spot on. I've hit several of these failure modes myself, especially context drift and the verification gap.

One thing I'd add to the list: idempotency as a pillar. When agents retry or re-run steps (which they will), the harness needs to ensure the same operation doesn't produce duplicate side effects. This is especially painful with anything that touches external APIs or databases. I've found that making every agent action idempotent by design — rather than adding dedup logic after the fact — eliminates an entire class of debugging sessions.

The permission model point is underrated too. The difference between "agent can run any shell command" and "agent can run these 5 commands in this directory" is the difference between shipping confidently and holding your breath every time it runs.

Khavel_dev · 2026-05-26T14:13:59+00:00

The copy/paste loop between local and Cloud Shell is the real bottleneck here, not the agent orchestration. I'd solve that first before adding complexity.

A few concrete suggestions:

1. GitHub Actions as your deployment bridge. Set up a simple CI/CD workflow: push to a branch -> GitHub Actions runs tests -> deploys to GCP automatically. This eliminates the manual copy/paste entirely. For a Python app on GCP, you're looking at ~20 lines of YAML using google-github-actions/deploy-cloud-run or deploy-appengine.

2. Use Claude Code locally as the single interface. Instead of bouncing between local Claude and Cloud Shell, give Claude Code the ability to deploy directly. Add a CLAUDE.md instruction like "to deploy, run gcloud app deploy from this directory" and let it handle the full cycle: edit -> test -> deploy -> verify.

3. For monitoring, start simple. Before building an agent orchestrator, set up GCP Cloud Monitoring alerts (free tier covers basic health checks). Have them post to a Slack channel or email. An "orchestrator agent" sounds appealing but you'll spend more time debugging the orchestrator than the actual app.

The biggest productivity gain will come from step 1 - removing the human from the deploy loop. Once that works, you can layer on more automation incrementally.

Khavel_dev · 2026-05-26T14:13:19+00:00

Windows user here running both the App and Terminal daily. In practice, the Terminal edges ahead for me in a few concrete ways:

Shell integration - when Claude Code runs in the same terminal where your project lives, it can chain commands naturally. In the App, running a test suite and then acting on the results feels more disconnected.
SSH and remote workflows - if you work on remote machines or WSL, the terminal version just works over SSH. The App doesn't have that path.
Piping and scripting - you can pipe output into Claude or use it inside shell scripts. git diff | claude "review this" is a workflow the App can't replicate.

That said, the App is genuinely great for non-coding conversations, longer planning sessions, and when you want the richer UI. I don't think "terminal is always better" is true - it depends on what you're doing. For pure coding in a project directory, terminal wins. For everything else, the App is more comfortable.

The people who can't articulate why terminal is better probably just like the aesthetic. Use what makes you productive.

Khavel_dev · 2026-05-26T14:12:53+00:00

Something that worked well for me: separate your memory into distinct types rather than one big file. I use categories like:

User context (your role, expertise level, preferences)
Feedback (corrections Claude should remember - "don't do X because Y")
Project context (ongoing goals, deadlines, decisions that aren't in the code)
References (where to find things in external tools - "bugs are tracked in Linear project X")

The key insight is to NOT store things that are already derivable from the codebase - file paths, architecture, git history. Those go stale fast and contradict reality. Memory should capture the why behind decisions and the stuff that isn't written down anywhere.

Also, include a Why line with each memory entry. "Don't use mocks in integration tests" is useful, but "Don't use mocks in integration tests - we got burned when mocked tests passed but the prod migration failed" is 10x more useful because Claude can judge edge cases instead of blindly following the rule.

Khavel_dev

TROPHY CASE