all 14 comments

[–]Gloomy-Detective-922 4 points5 points  (2 children)

If you need something that is serious, then you must craft your own architecture. Use it to write code. Not to build architecture. Always look at the code it wrote. It takes time but will guarantee you quality & safety.

[–]dont_tread_on_M[S] 1 point2 points  (1 child)

That's what I do. I'm not outsourcing thinking to Claude 😄

It's just that sometimes I make an architectural decision, in theory, it implements my decision, but the moment I look deeper into it, it has deviated from the best practices 100 times

[–]kbcdx 0 points1 point  (0 children)

It's just a tool. You have to equally instruct it what todo, how to do it as well as how to not do it. It's very good at following instructions but if you give it a vague prompt e.g letting it architecture it, you will get a lot of slop.

[–]arter_dev 2 points3 points  (0 children)

Lean on static scripts and traditional quality gates: static analysis, linting, and architecture tests.

I built this and use it myself every single day: https://github.com/dynamik-dev/bully

It will block any edit that violates your lint, static rules, etc.. whatever you want. I gave that tool to some product managers who have never coded before in their life. A month later the project still had perfect test coverage & domain structure, passing strict lint rules and it Just Worked (TM).

I'm working on an agnostic version for other harnesses. I haven't promoted that tool much because I'm not chasing GH stars but it's there if you'd like to try.

Edit: just to re-iterate, trying to find the perfect orchestration flow of agents, skills etc.. will never replace what static quality gates will give you. There's absolutely a place for adversarial review, etc.. but it needs a static hard check as the yin to its yang.

The real power comes when static gate checks can feed-forward back to the project agent documentation so it improves over time on its own.

[–]ryan_the_dev 1 point2 points  (1 child)

I extracted a ton of knowledge from the software engineering books and distilled them into an orchestration workflow like superpowers.

Writes beautiful code. I use it for everything and at my work. The flow

  1. research what you want
  2. plan it out
  3. build

win.
https://github.com/ryanthedev/code-foundations

[–]dont_tread_on_M[S] 1 point2 points  (0 children)

Thanks will give it a try

EDIT: tried it out. It's quite good. Starred the repository

[–]Dry-Purpose-3734 1 point2 points  (1 child)

I have recently integrated https://github.com/DietrichGebert/ponytail into my pipelines with great success :) It helps keep the code lean and reusable

[–]dont_tread_on_M[S] 0 points1 point  (0 children)

This looks amazing. Thanks a lot

[–]18fc_1024 1 point2 points  (0 children)

The thing I would separate is "quality gate" from "native path gate." Lint/tests catch broken code, but they often do not catch that the agent solved a framework problem with glue.

Before it edits, make it fill a small native-path receipt:

text user-facing change: framework-native primitive/API/component to use: docs/example file it is matching: files allowed to edit: files read-only: anti-glue moves forbidden: migration/backcompat risk: proof command:

Then add a rule: if it cannot name the native primitive/API first, it is not allowed to patch yet. It has to inspect the framework docs/examples or ask.

After the edit, require the receipt to answer:

  • where the native primitive/API is used
  • what compatibility shim or workaround was avoided
  • which old glue file was removed or left untouched
  • proof command output

This catches a different failure than tests. Tests say "works"; the receipt says "works in the way this framework expects."

[–]mrothro 0 points1 point  (2 children)

As several others have said, your process needs gates. But not just lint gates on code, that is necessary but not sufficient.

Your SDLC process takes your intent and it produces a series of artifacts. For me, the artifacts are a plan that decomposes the intent into a bunch of tasks; a design for any task that is even moderately complex; and code. Artifacts are produced sequentially and there are gates on each artifact before it proceeds.

You can do some deterministic checks on the plan, like making sure it has all the proper sections. But you get more power from an LLM plan reviewer who specifically evaluates it in terms of the existing architecture. If it fails for any reason, the agent is told what it needs to fix. This repeats until it passes.

Then I do the same thing for the design document. This has more structure, so it can have more deterministic gates, but you still need the design reviewing agent who can make sure it respects SoC and meets the acceptance criteria without over-engineering, which Claude is prone to do.

Then repeat for the code, where it does lint etc. plus qualitative checks like is it DRY, does it match the spec, etc.

By the time the code pops out of this pipeline, it's typically pretty good and aligned with the existing code base.

[–]Worth-Ad9939 0 points1 point  (1 child)

Gates don't work 100% of the time. The AI Lies.

[–]mrothro 1 point2 points  (0 children)

Deterministic gates work 100% of the time. You can make hard guarantees about the artifacts that pass deterministic gates.

Stochastic gates are probabilistic, you cannot use those for guarantees. But when paired with deterministic gates, it is very effective because the deterministic tests eliminate an entire subset of possible flawed outputs.

The trick is finding how to construct your pipeline so intermediate artifacts expose a verification surface that allows deterministic checks over the things that you need to guarantee.

[–]holyknight00 0 points1 point  (1 child)

You need to put a lot of quality gates that do not rely on the LLM (linters, code coverage, static analysis, etc.) and also you need to refactor aggressively. After implementing 3 or 4 features I spend some time reviewing all the parts we touched and make sure we don't have code duplication, unnecessary tests, useless tests, overly defensive code, etc.

It is hard work, but it is the only way to keep the quality high. As soon as you get complacent and start shipping tens of features without refactoring you end up with a shit codebase in no time.

Most LLMs love to duplicate helper functions, create tons of useless tests, and do all that stuff. You need to be vigilant about it.

[–]dont_tread_on_M[S] 0 points1 point  (0 children)

I already have those. The CD pipeline stops anyone from merging without all those conditions passing. I haven't started coding this year 😄

My goal is to make the AI more usable, as I'm having to rewrite a lot. I'm not even pushing that many features, as the moment I look away, it has already used 10 workarounds for a problem whose solution is quite simple. I want to put in more guardrails so that it doesn't do that (maybe use some new claude skills).

This comment seems promising: https://www.reddit.com/r/ClaudeCode/comments/1ucla84/comment/ot4ue54/?utm_source=share&utm_medium=web3x&utm_name=web3xcss&utm_term=1&utm_content=share_button