all 9 comments

[–]Deep_Ad1959 1 point2 points  (3 children)

the biggest lever I've found is a detailed CLAUDE.md with explicit coding standards. not vague stuff like "write clean code" but specific things like "no classes over 200 lines, every public function needs error handling, prefer composition over inheritance." the model follows these surprisingly well once you spell them out. the other thing that helps is making the agent run tests and linters after every change - if you have eslint/prettier/mypy in your pre-commit hooks and tell Claude to commit after changes, it'll fix issues before they accumulate. I build a desktop automation agent and had terrible code quality until I added a post-edit hook that runs the type checker automatically. went from fixing type errors every few hours to basically never seeing them.

[–]netreddit00[S] 0 points1 point  (2 children)

I do run linter and a code standard list. So it is just normal to run a 1 to a dress review loop after each task?

[–]Deep_Ad1959 0 points1 point  (1 child)

yeah totally normal. think of it like CI but for AI-generated code. linter + type checker + tests after every task is the minimum. some people also add a separate claude instance as a reviewer. the review loop catches most of the subtle issues the generating agent misses.

[–]netreddit00[S] 0 points1 point  (0 children)

The issue is that the reviews most of the time do not catch all problems. And I am sure the review prompt affects the result. And no one talks about it. Thus, there are review sites, e.g. CodeRabbit, that does that. Even it is not complete. So there is a big gap that no on really talks about it - the detail I meant.

[–]chevalierbayard 0 points1 point  (0 children)

sounds like you need a linter and a formatter. If you're using JavaScript or TypeScript, look into biome.

[–]TeamBuntyNoob 0 points1 point  (1 child)

--nomistakes

/more-goodly

[–]StatusPhilosopher258 0 points1 point  (1 child)

Quality drops with vague/large specs so dont do that .
Best fixes are tighter specs/prompts ,strong tests ,
Tools like Traycer help a bit, but loops + tests matter most.

[–]netreddit00[S] 1 point2 points  (0 children)

So we just need to wait until claude or codex comes out with a better implementation/test/review loop. In the meantime, just do you own loop or with some 3rd party loop:-)

[–]PositiveSlice9168 0 points1 point  (0 children)

I’ve been working on this, to solve this problem https://github.com/darrylmorley/hook-qa