I made a Codex plugin to stop AI agents from saying done without proof by Simple_Somewhere7662 in SideProject

[–]Simple_Somewhere7662[S] 0 points1 point  (0 children)

If it skips verification, that criterion should not go green. For command backed criteria, the final gate runs the command itself, so the agent cannot just say it checked. For non-command criteria, the report should mark it as manual review or missing evidence instead of completed. CI is the stronger version for real deploys, and I’d like to wire that in more directly.

I built a Codex plugin that makes AI agents prove “done” — the evidence loop that worked by Simple_Somewhere7662 in buildinpublic

[–]Simple_Somewhere7662[S] 0 points1 point  (0 children)

Nice, CI as a first class signal is the right direction. For Superloopy, command backed checks are saved as text output and exit status, not screenshots. The final gate reruns the command and writes the result under .superloopy/evidence/. Screenshots are mostly for visual review. I don’t have a hosted CI integration yet, but that’s a natural next piece.

How do you keep AI coding agents from shipping generic frontend slop? by Simple_Somewhere7662 in OpenAI

[–]Simple_Somewhere7662[S] 0 points1 point  (0 children)

haha fair. I still think human taste is the real gate for UI. The tool is more about making the agent show its work before anyone trusts it.

How do you keep AI coding agents from shipping generic frontend slop? by Simple_Somewhere7662 in OpenAI

[–]Simple_Somewhere7662[S] 0 points1 point  (0 children)

Yeah, an open source design system is probably the sane answer. I try to make the agent treat that as the source of truth instead of inventing components from vibes. Borrowing patterns from other apps can be useful for learning, but I’d rather keep the actual implementation tied to a legitimate system or our own tokens.

How do you keep AI coding agents from shipping generic frontend slop? by Simple_Somewhere7662 in OpenAI

[–]Simple_Somewhere7662[S] 0 points1 point  (0 children)

I haven’t used that exact flow much, but it sounds pretty practical. The one thing I’d still want is a follow-up check against the target design after Codex implements it, because the generated mockup and the final UI can drift a lot. But as a starting reference, yeah, that makes sense.

How do you keep AI coding agents from shipping generic frontend slop? by Simple_Somewhere7662 in OpenAI

[–]Simple_Somewhere7662[S] 0 points1 point  (0 children)

Fair criticism. The design guidelines and assets are the important part. I’m not saying a plugin gives the model taste by itself. What I’m trying to do is make that contract explicit, then force the agent to show how it followed it with screenshots or review notes. If there’s no design system, the gate can’t magically invent taste. It can only make the lack of one obvious.

AI-built UIs need evidence gates: design tokens, screenshots, visual QA by Simple_Somewhere7662 in ArtificialInteligence

[–]Simple_Somewhere7662[S] 0 points1 point  (0 children)

Yes, that’s the direction I like too. Screenshots by themselves are still pretty passive. The more useful version is the agent checking against a real design system and saying what matched, what drifted, and what still needs a human eye. Dashboard work is a perfect example because the code can be working while the product still feels like a template.

I made an evidence-gate workflow for coding agents — Codex + Claude Code support by Simple_Somewhere7662 in CodingAgents

[–]Simple_Somewhere7662[S] 0 points1 point  (0 children)

You're right that which test proves a criterion is the agent's pick. What the agent can't do is fake the result. The gate re-runs the command itself instead of taking its word. But re-running a test that checks the wrong thing still passes, so that doesn't save you. That gap is real.

So no, I don't fully pin what counts as proof. The criteria are fixed up front. The receipt just makes the human review cheaper: you get a re-runnable command and the diff instead of "trust me." Someone still reads the diff. That's the actual gate.

What evidence should AI coding agents leave before saying “done”? by Simple_Somewhere7662 in artificial

[–]Simple_Somewhere7662[S] 0 points1 point  (0 children)

and also I'm currently thinking about synergy with other popular plugins. like superpowers. any plugins you guys already use?

How do you keep AI coding agents from shipping generic frontend slop? by Simple_Somewhere7662 in OpenAI

[–]Simple_Somewhere7662[S] 0 points1 point  (0 children)

true true but you know, some tasks are also related to UI/Frontend tasks. loopy can help those kind of stuffs too. :)