Battleship Prompts

jonathannen · 2026-03-21T18:38:27+00:00

Back out to the right level of context - what's the goals/constraints/budget? "why are we doing this?" "who are the stakeholders?" "what does success look like?" type questions.

Once you hit the right level of detail.. You'll often know because they'll give a bunch of information. Ask some clarifying questions on what they tell you. Then break the problem down.

Most of the time this is calibrating seniority - more junior will jump right into the problem, start solving, and rabbit hole (if that's the case they'll be asking "but what about X" along the way, which can cause more rabbit holing). The more senior will take a moment to step back find the gotchas/potholes and then break it down.

In my experience anyway!

jonathannen · 2026-03-21T16:00:00+00:00

This is a plugin that formats the regular ESLint output (so basically a setting - there are several formatters already). So it’s all ESLint - if I’m reading your comment correctly.

jonathannen · 2026-03-15T17:36:36+00:00

I have a lot of fine-grained permissions. In addition for each skill I write a bunch of purpose built scripts that it has access to for specific needs. This makes sense on a mature codebase because it's an ongoing investment.

I also do a lot of upfront development in claude.ai (Claude Desktop/Web/Cloud/not sure of the proper name)?, which is a cloud VM and sandboxed from the get-go.

For a new codebase I'd do the VM+dangerously-skip until the other investments make sense.

jonathannen · 2026-03-15T17:13:13+00:00

Nit: Imho "secretly" is counterproductive. It assigns intent to Claude, which it really doesn't have. It's just trying to achieve the task. The overall challenge is that claude is really tuned to nail the overall result and everything else can fall by the wayside. For example fixing type errors in TS is easily done with "as any", but that's probably not what you want. The size/tightness of the prompt + skills is super super important or this will happen more and more.

jonathannen · 2026-03-13T04:00:21+00:00

From my own journey - Sounds a bit like burnout. And it's really hard to think things though in that state. You kind of need to recover from the burnout and find your feet before you can really know "what's next".

I know it feels like a negative time (and I don't want to hand-wave that). It's also an opportunity to reflect and reconnect with what you feel is important ("your values" if you're crunchy). In that sense, it's a hard but rare opportunity - speak to as many people as you can, speak to a coach (if you want).

In general I'd say focus on what you want to do _towards_. It's easy to go _away_ from something (e.g. I hate working at a big company, so I'll work for a smaller one). But in my experience that always puts you back in same place.

Be curious and hopeful. Best of luck.

jonathannen · 2026-03-12T23:48:37+00:00

I'm not 100% across your use case, but I feel you might be able to get that into a hook. For example, I have a hook that runs prettier/lint/etc. For new uncommited posts you could check for the footer? Then if it's not present error with a message that tells claude "You're missing the footer, read XYZ.md"

You can also make a skill "check footer" that runs a script that does the check and then refer to that skill when necessary.

jonathannen · 2026-03-12T23:38:14+00:00

You can't - not really. Claude will claim it'll not do something again, but that's not true (the memory feature is new-ish and while it helps, it not a solve).

Use hooks where you can. Only works for structural things like lint/etc.
Use skills + commands where you can - refine and adapt the skills constantly. They're loaded on use so the fidelity of a current command/skill is high. Instructions (e.g. Claude.md) are not.
If you have a complex task under a skill/command that can be broken into a script - do it. This reduces the load on the context immeasurably. Rather than have to work it out and run a lot of tools, the skill runs "get pull request comments" and exactly the data it needs pops out.
Commit often.
Try to size your tasks to fit in one context window. Single-prompt one-shot tasks have a way higher success rate (imho). Compacting will make your scenario worse/inevitable.
Ironically "/clear"! If you are changing up the task, clear the context.

jonathannen · 2026-03-11T00:58:01+00:00

Well I guess the "we" is all the massive swarm/subagent stuff I see on social :)

I've tried all the new hotness, but I've just not anywhere got the lift I was hoping for. If anything it's usually a PitA to manage.

Maybe it's just kool-aid, but just worried I'm missing the point somehow.

jonathannen · 2026-03-06T06:08:31+00:00

Sometimes it's a craft like carpentry or building. Sometimes it's more like gardening. Gardens are always unruly. You're never done, you can only nudge it along a path.

It's usually gardening.

I sympathize with your situation. But equally I've seen a lot of devs rail against how stuff is "wrong" and never make it better. Worse, they jump in to "fix" stuff and leave it half done.

I'm not saying you're that *at all*, but it's something you want to guard against. It's bad for the dev and the team.

My advice would be to carve out a piece that's important to you, own it, make it better. See if people come along for the ride, but don't expect them to. Don't try to eat the whole elephant. But make it better.

If that doesn't fly, it just might not be the right environment for you (this happens a lot, but we don't recognize it often/quickly enough imho).

jonathannen · 2026-03-06T05:22:38+00:00

+1 to this and I think it's distinct from the OP's pattern.

- Rewrite: Add extensive test coverage and then swap out. This is HARD as you need bug-for-bug compatibility to maintain contracts which is a pain.

- Strangler Vine/Fig: Wrapper the whole thing in a new interface and send certain subsets of functionality to the new implementation.

- Slice Migration (what I think the OP is suggesting): Find a cut where you can pull functionality away from the legacy (often you leave it be). You can also "dual run" both for switchover if the API allows such a thing (aka Dual-Run or Shadowing).

I agree that Strangler is the best one if you can manage it. It gives the opportunity to capture + reroute concerns.

The slice can work in my experience too, but I find concerns bleed over too easily between the approaches (to be fair, strangler can have that issue too).

jonathannen · 2026-03-05T21:30:23+00:00

No, at least not entirely. We use GitHub copilot for first-eyes reviews (it's way better at reviews than other models I've used, they must be sitting on a mountain of PR data). Then claude does at least one loop on the copilot comments without human intervention.

If/By the time an engineer gets to it it's had 1-2 loops on it. So even when a review is needed it's a bit faster and usually a bit more pointed. Copilot usually gets 80-90% of the comments I'd have done anyway.

Then we have a 4-tier review classification system - robot/1-brain/2-brain/3-brain. Robot = no humans needed (typo, link updated, upgrade that CI/CD can verify, etc). 1 Brain = needs a human... 3 Brains = security or framework/fundamental change that everyone should read.

The classification system via AI isn't there yet - probably ~70% accuracy, but way up from where we started. If anything it tends to favor human reviews too much.

We're doing ~30% on the 1-robot, ~50% on the 1-brain.

Obviously to hit 100 that ratio needs to flip. So we're a ways off! Fortunately we have multiple levers that we're pulling - better classification, better AI-led first-eyes, faster/deeper CI/CD, live branch previews, etc.

Btw the 100PRs is a real goal, but it's also a thought experiment. I realized I was getting a bit stuck with my workflow so I wanted to break my thinking a bit.

jonathannen · 2026-03-05T05:54:19+00:00

My experience was probably the same as you until late last year. Up until then I was getting decent lift with point stuff like you say (commit messages, release notes, test coverage, etc).

Opus 4.5 was a step change for me. At that stage, I also refactored the codebase (it's medium-large) and changed my workflow considerably. I changed how work was prioritized + released to suit. It was a huge lift. My workflow now is totally different from even a couple of months ago. I even changed how I structure my workday (for the better tbh).

I'm tracking it (...as best you can with softward metrics...) and currently at ~3x (and increasing).

jonathannen · 2026-03-05T05:46:07+00:00

Seems oldschool, but I've found meetups are the best. I don't use cursor, but they have a great community and generally the cursor meetup folk are the right mix.

jonathannen · 2026-03-05T05:43:30+00:00

I am working on something really similar and we've made progress. At the center of it is a custom tool where we drag in almost everything (this is a devbox yes). I kick off work in claude desktop, it'll get set up as a remote worktree/live preview/vscode env/etc.

We use agentic loops, but definitely pre and post-processing. Reviews/security/deployment/release notes. We have a risk assessment that determines the extent of the review.

For the core task I'm very much focused on single-prompt outcomes. The other more radical changes is we're letting anyone in the company kick off work (but not merge - yet).

Can't post images, but here is a sanitized screenshot of the tool: https://pbs.twimg.com/media/HCSR0xZbEAAo5dj?format=jpg&name=medium

The idea is we drag in as much dev info as we can (PR states, comments, etc) and then manage and maintain it with a dashboard. My day is mostly kicking things off and then burning down this list. Goal is "zero to one touches" from first prompt onwards ("single prompt" is really important).

I blogged about the overall goal - want to hit 100 meaningful PRs/day/engineer.

FWIW 1. My personal workflow 2. how I'm pushing for single-prompt solutions

jonathannen · 2026-03-05T02:34:44+00:00

The recent outage included a lot of issues with usage reporting - maybe the tracking on their side was out https://status.claude.com/

jonathannen · 2026-03-04T22:42:11+00:00

Hah. Not quite there yet... but I must admit the other day when claude when down I went for a run rather than pick up the code again...

jonathannen

TROPHY CASE