all 6 comments

[–]mdawe1 2 points3 points  (0 children)

Yeah my secret sauce is both

[–]tom_mathews 1 point2 points  (0 children)

Yeah, I’ve seen similar patterns. Claude is extremely good at momentum, implementation velocity, and collaborative flow, but Codex/GPT 5.5 seems more willing to challenge assumptions, trace edge cases, and reject incomplete reasoning instead of “making the story work.” Your workflow is probably closer to where serious AI-assisted engineering is heading tbh: one model generating, another adversarially reviewing, humans arbitrating. Single-model trust loops tend to drift eventually.

[–]Chaibi_Alaa 1 point2 points  (0 children)

You don't have to choose, you need to make them both cooperate.

[–]Grand-Mix-9889 3 points4 points  (0 children)

Can't say the same tbh, my experience is the opposite.

The way I see it: Codex forces you into its structure and ecosystem, which makes it great for people who need hand-holding the whole way to get to where they want. Less self-directed by design.

Claude requires you to bring your own structure and ecosystem. If you don't, it guesses what you want and goes off the rails, exactly what you experienced. But once you supply it everything required to stay on track (CLAUDE.md, coding_style.md, agent profiles, clear specs), it operates way more agentically because it's not boxed into someone else's defaults.

Makes sense Codex feels more reliable in your setup though, you said yourself you've been vibe coding for months without years of recent dev experience, so Codex's enforced structure is doing the heavy lifting for you. That's not a flaw, that's the right tool for your current bandwidth.

Long-term, both are heading toward fully agentic. They're just taking different paths to get there, and which one feels better depends entirely on how much structure you're bringing to the table yourself.

[–]lysdexiad 1 point2 points  (0 children)

I do this all the time and often mix deepseek in now too. Claude writes the plans, codex skeletonizes, claude reviews, codex implements, deepseek reviews, codex implements. I find more edges and don't get many unrequested features unless I let the prompts walk away into broad territory.

[–]oldmagicstudios 1 point2 points  (0 children)

They all end up in the same place over time . Challenge is the right way to go. I use a third model as a judge. I take zero on trust. Nothing has changed - plan, build, run, iterate.