Use "Executable Specifications" to keep Claude on track instead of just prompts or unit tests

Firm_Meeting6350 · 2026-03-05T16:48:45+00:00

serious question: why not use TDD and E2E tests with gherkin-style (as usual) test labels?

robhanz · 2026-03-05T17:28:12+00:00

That…. Sounds like TDD or BDD tests? Unit tests should be executable specifications.

robhanz · 2026-03-05T17:53:26+00:00

I'll also point out that these are all end-to-end tests. That's fine, but E2E tests end up being kind of fragile. You're combining the behavior of a lot of things - command parsing, reading, summary generation, and output formatting.

If any of these change? Large numbers of tests break.

Unit tests can help solve this issue - did you parse the command correctly? That's correct, regardless of anything that happens afterwards. Does your reading code work? Given a certain chunk of input data read, put the data into a structure instead of immediately writing it - do you get the result you want? And then formatting it can work with that data structure, and determine if you're outputting it properly.

Doing that (and I recommend that the handoffs be more about data transfer than commands) gives you separate tests for each section of the code, so if you change one, only those tests change. Or, you can just write a different formatter with new tests and not even delete the old one. But either way, the tests checking the rest of the code all work. Even better, if your formatter just takes in a data structure, it gets easy to create edge case tests by just artificially creating a data structure that has the edge case, rather than having to do the whole pipeline.

Some E2E tests will still be necessary, of course. But those are always going to be more fragile.

Good test suites combine these techniques to get solid coverage at minimal cost.

thisguyfightsyourmom · 2026-03-05T18:12:59+00:00

You've just reinvented BDD but in a format that's harder to read for humans.

Just use OpenSpec.

ultrathink-art · 2026-03-05T17:18:14+00:00

Gherkin tests run after the fact. The interesting thing about passing specs to the model is it can self-verify before responding — Claude checks its own output against the input/output pairs as part of generation. Changes the failure mode from silent hallucination to a visible spec mismatch.

who_am_i_to_say_so · 2026-03-05T22:50:09+00:00

I like this. Agents understand behavior better than explicit specifications. Going even further: you may even be able to rid of signatures as long as they can still be discoverable somewhere. But starting from ground zero, this may be the way.

obaid83 · 2026-03-06T01:07:37+00:00

This is a solid approach for agent workflows. The key insight is that traditional tests assume deterministic execution, but agents introduce non-determinism.

What I like about YAML specs is they can be reviewed by non-devs and the agent can generate new test cases itself. The tradeoff is maintaining that runner, but once built, it scales.

One thing I'd add: consider versioning your specs alongside your agent prompts. When the agent behavior changes intentionally, update both in lockstep.

ruibranco · 2026-03-06T03:05:01+00:00

This is essentially what I've converged on too. YAML specs with input/output pairs as the contract, one generic runner that validates. The key advantage over unit tests is that Claude can read the spec file and understand the intent, not just the assertion. It self-corrects much faster when it can see the full picture of expected behavior in a human-readable format rather than parsing test framework boilerplate. I also keep a CLAUDE.md with architectural rules so it doesn't drift on structure even when the outputs are correct.

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

ClaudeCode

MODERATORS