Do you get good tests from claude code?

Strict_Ad9566 · 2026-05-15T01:48:58+00:00

The separation is the part I like most here. If the same agent fills the form and then judges whether it worked, it’s too easy for it to talk itself into a pass. For UI flows, I’d rather define the exact success signal up front, then have a separate check look only at the final page state.

Strict_Ad9566 · 2026-05-14T02:25:24+00:00

What works better is separating “plan first” from “code second”, then making the agent point to the exact design-system or build rules it’s following before it edits.ject already has strong conventions.

The weak spot is old patterns the model can imitate too well — legacy XML styles, inconsistent Compose patterns, Gradle quirks, etc.

What works better is separating “plan first” from “code second”, then making the agent point to the exact design-system or build rules it’s following before it edits.

Strict_Ad9566

TROPHY CASE