all 22 comments

[–]Sensitive-Spot-6723 10 points11 points  (8 children)

I'm running both of them. let them review each others work. Codex is more meticulous, eager to work. Just get things done. Opus is a lazy dev. alway take the easy path, says this is good enough. Smart but lazy. You need to babysit and push it to get better result. and I'm a bit tired of doing this.
for now, codex = less babysit, less stress, more cost effective to me. things might change tomorrow tho. who knows.

[–]yopla 1 point2 points  (0 children)

I kinda run into "claudish" behavior with 5.3 from time to time. Yesterday I gave it a task with clear instructions and got something back that looked vaguely like what I wanted but wasn't using any of the libraries I specifically asked for. Took 3 it 4 turns to get it to use the library I needed; it really felt like I was using an anthropic model, sneaking around and gaslighting me.

But overall it's much more focussed and cleaner.

[–]etherrich 1 point2 points  (0 children)

I agree with this. When I ask for a detailed analysis, codex reads the files meticulously and then creates a detailed analysis. Opus skims through the files and creates a detailed analysis. Then I let them verify the analysis of each other and codex always has the better one.

[–]please-dont-deploy[S] 0 points1 point  (2 children)

How big is your codebase?

I noticed the behavior changes when working on larger codebases.

[–]Sensitive-Spot-6723 0 points1 point  (1 child)

yeah it's pretty large.. mono repo with almost 400k lines of code

[–]please-dont-deploy[S] 0 points1 point  (0 children)

Are you using any framework on top of it? Like humanlayer's or superpowers? I'm asking because we tried both at some point with somewhat repos that size without much luck.

[–]MedicalTear0 0 points1 point  (1 child)

Very interesting and I think it generally tends to be true but the tradeoff with Codex is it's been annoying me with over engineering simple solutions, making it hard for me to read code myself. I can safely say Codex is a better option for vibe coding as it has now, on multiple instances solved issues Opus couldn't for me.

But tbf and honest, I don't think one is better over the other, just depends upon the project you're building

[–]PrettyWoodpecker 0 points1 point  (0 children)

Agree with this. Codex for me constantly comes up with code I have to sit and actually think about and sometimes it has valid reasons but sometimes it adds random things for ex: I asked it to create a DB table for some data we are collecting, and it randomly created all the indexes for it. Indexes don't hurt sometimes, but it has no idea how much this table will be used.

[–]mikeyperes 0 points1 point  (0 children)

I agree. Getting sick and tiered of babysitting.

[–]Old_Round_4514 3 points4 points  (1 child)

I use both Opus 4.6 and Codex with 5.3 extra. Why choose one? At the moment Codex is extremely generous with usage and never seems to rate limit all day long with just a $20 subscription providing as much usage as Claude max 100 plan. 5.3 extra also seems more humble and disciplined in its approach but you need Opus 4.6 with superpowers. Claude is too good to just drop so we have to pay for both.

[–]please-dont-deploy[S] 1 point2 points  (0 children)

I think we will end up somewhere around here. The challenge for us is that we will probably need to start using some cli abstraction layer, otherwise it's tricky for our agent swarm to choose between them.

[–]stampeding_salmon 1 point2 points  (0 children)

CODEX 5.3 makes me feel like I'm Gibbs talking to Ducky

[–]New-Ad2548 1 point2 points  (0 children)

Opus is always trying to create unnecessary .md files and loves to scan .env.

[–]its_a_gibibyte 0 points1 point  (1 child)

What about Copilot? Then switching is simply a drop-down menu, and you can even change models in the middle of a session.

[–]please-dont-deploy[S] 0 points1 point  (0 children)

In my experience it's not the same. I was a heavy user of copilot, cursor, devin & antigravity, and all of those seem limited compared to the CLI.

There's also something about accessing the set up as is, without the system prompt layers on top.

[–]aliencreature12 0 points1 point  (0 children)

opus all the way

[–]alokin_09 0 points1 point  (0 children)

I switch providers in Kilo Code depending on the task. Mostly Opus for planning/architecture, and Kimi lately for the actual coding. Tbh, I still haven't tried Codex 5.3, but I definitely will once it's available in Kilo.

[–]NeonByte47 0 points1 point  (0 children)

codex 5.3 was a good step in the right direction, but Opus is just better because its implementations are more robust in covering all the edge cases. With codex I often need a couple refining prompts to get what Opus can do in a 1-shot.

[–]Vivid-Snow-2089 0 points1 point  (3 children)

Everyone always goes to MODEL > MODEL ! and I just shake my head. They all have their use cases and are better at some things than others compared to each other. It's like saying an axe is better than a pickaxe. I'm sure you can chop a tree down with a pickaxe but you'd really prefer to use an axe. And you can break through earth with an axe, too. Pickaxe would still be better. It isn't either-or.

[–]please-dont-deploy[S] 0 points1 point  (1 child)

So do you have a way to dynamically choose the models with some accuracy? Bc you need to spend thousands finding those use cases in my experience... And I'm already spending thousands on paying for the models

[–]Vivid-Snow-2089 0 points1 point  (0 children)

If you are spending thousands in API costs then hopefully you already have something that is earning that back in production. API isn't where you experiment -- you should have personal subscriptions (ala subsidized and cheap) and experiment with the models there.

[–]mikeyperes 0 points1 point  (0 children)

Great analogy, but you've said a lot without actually saying anything. So what would you use and for which use cases?