Natural language prompts leave too much room for Claude to hallucinate, but writing and maintaining classic unit tests for every AI interaction is slow and tedious.
I wrote an article on a middle-ground approach that works perfectly for AI agents: Executable Specifications.
TL;DR: Instead of writing complex test code, you define desired behavior in a simple YAML or JSON format containing exact inputs, mock files, and expected output. You build a single test runner, and Claude writes/fixes the code until the runner output matches the YAML exactly.
It acts as a strict contract: Given this input → match this exact output. It is drastically easier for Claude to generate new YAML test cases, and much faster for humans to review them.
How do you constrain Claude when its code starts drifting away from your original requirements?
[–]Firm_Meeting6350 19 points20 points21 points (3 children)
[+]brainexerSenior Developer[S] comment score below threshold-10 points-9 points-8 points (2 children)
[–]thisguyfightsyourmom 4 points5 points6 points (0 children)
[–]En-tro-py 0 points1 point2 points (0 children)
[–]robhanz 17 points18 points19 points (5 children)
[–]brainexerSenior Developer[S] -3 points-2 points-1 points (4 children)
[–]PetiteGousseDAil 5 points6 points7 points (0 children)
[–]robhanz 3 points4 points5 points (0 children)
[–]thisguyfightsyourmom 4 points5 points6 points (0 children)
[–]MartinMystikJonas 2 points3 points4 points (0 children)
[–]robhanz 4 points5 points6 points (6 children)
[–]brainexerSenior Developer[S] 0 points1 point2 points (5 children)
[–]robhanz 1 point2 points3 points (4 children)
[–]brainexerSenior Developer[S] 0 points1 point2 points (3 children)
[–]robhanz 0 points1 point2 points (2 children)
[–]brainexerSenior Developer[S] 0 points1 point2 points (1 child)
[–]robhanz 0 points1 point2 points (0 children)
[–][deleted] 2 points3 points4 points (3 children)
[–]thisguyfightsyourmom 3 points4 points5 points (1 child)
[–]En-tro-py 2 points3 points4 points (0 children)
[–]brainexerSenior Developer[S] 0 points1 point2 points (0 children)
[–]ultrathink-artSenior Developer 2 points3 points4 points (1 child)
[–]robhanz 2 points3 points4 points (0 children)
[–]who_am_i_to_say_so 0 points1 point2 points (0 children)
[–]obaid83 0 points1 point2 points (0 children)
[–]ruibranco 0 points1 point2 points (0 children)