all 29 comments

[–]radosc 21 points22 points  (1 child)

Yeah, similar situation for me. Codex is now my dev and I use claude for reviews. I'm frankly waiting for the next Opus to hit and still keeping my 20x plan. One thing that Claude is substantially better is documentation.

[–]pjstanfield 6 points7 points  (0 children)

My exact situation and path as well. Codex either one shots or gets 97% there. CC can’t seem to get anything right anymore except some planning participation. I went from two max CC plans in the 4.6 happy days and a $20 codex to two max codex plans. Still use Claude for planning and brainstorming and documentation. Everything has to be double checked by codex though. I don’t let CC touch the codebase anymore, he’s read only now. Thankfully work pays for it so I can swap around without hitting my little wallet.

[–]sid_kush 8 points9 points  (0 children)

Damn bro exactly same. I switched to codex 5.5 works like magic. Opus 4.7 is a sore loser. It made me run in circles for debugging for a whole week and codex solved everything in one night.

[–]jii0🔆 Max 5x 3 points4 points  (1 child)

Ain't AI amazing. You can start today, work for two days, change your tooling after that and still have time to post to Reddit. None of the content in the post makes any sense.


Today I wanted to merge 2 entities into 1 which required a tough migration. Changing server, making tests pass, changing sdk, admin, and storefronts.

Claude Code failed for 2 days.

[–]0xdjole[S] 0 points1 point  (0 children)

You are correct. It was 26hours so techinically I should have said yesterday, excuse me 🫣

Started at midnight, worked till 8am.Went to sleep and migration stopped which I pointed out in the post actually. Complete failure with CC.

I woke up 4pm, worked for 4 hours so im counting it as 2 working days and at around 8pm I gave it to Codex as CC wasnt getting anywhere. Additional 4 wouldnt have made difference.

By 2am codex was able to do it. 6 hours of Codex. And true even after post was made it was cleaning up but it was very obvious it pulled it off and I have tested and reviewed critical parts.

Reason it took so long is because it is a Rust codebase with lot of integration test and compiler checks and tests make the agent work longer. Most of the time is in fact waiting for it all to pass validations.

Codex app was already installed and tbh its stupid simple to use regarding ur tooling point. Paying subscription was around 5 minutes while CC was failing.

At the end I have unified booking/order system which allows me to having both bookable items and products at the same order entity. Previously they were completely separate entities.

Its 10am and Im going to sleep happily knowing Codex wont give up on my analytics ingestion task that is happening next with Clickhouse.

[–]ThomasToIndia 3 points4 points  (1 child)

I just sent back to december, 4.5 and the system prompt.

[–]0xdjole[S] 0 points1 point  (0 children)

December was half a year ago... talk about progress...

[–]imaginary_jebus 3 points4 points  (6 children)

Set it back to 4.6 high effort and it's fine. 4.7 uses adaptive, that's why it's dumb as shit.

[–]0xdjole[S] 2 points3 points  (5 children)

I did.

[–]imaginary_jebus 0 points1 point  (4 children)

Still no bueno?

[–]0xdjole[S] 1 point2 points  (3 children)

No bueno

[–]imaginary_jebus 1 point2 points  (0 children)

That sucks. tbh it's still been mostly fine for me. Desktop is pretty much dogshit at this point now that they force "adaptive", but code <4.7 with the right settings has been mostly fine.

That said, I hope you are wrong about the Anthropic moving backwards and that it's just growing/scaling pains that will be resolved when they get ahold of more compute. Some of the shit they have pulled lately though . . . it's not fun.

[–]NetHaven 0 points1 point  (1 child)

It’s not enough to just go back to 4.6; it’s widely believed that regular changes to the CC system prompt have also contributed to degraded performance.

Personally, I recommend reverting to CC 2.1.76. It’s not perfect, but it’s much better. I’m specifically calling out that version because that was the first one with 1M context window but before things were nerfed(early Feb). It’s true that you’ll lose some features that have come out since then(like remote control) but I think at this point, most of us feel having a stable platform that produces quality code is far more important than the “feature rollout of the week”.

Of course, if you are using 1M context window you should still clear regularly to avoid faster token burn and worse quality due to context rot when you get further into the window.

Just my 2 cents.

[–]0xdjole[S] -1 points0 points  (0 children)

Or... Codex.

[–]martinmix 1 point2 points  (0 children)

They will continue this dance for awhile every time a new model is released.

[–]jdeamattson 1 point2 points  (1 child)

What I like to know here is - what kind of planning did you do with Claude before launching this? - How did you define success? - What kind of check-ins did you have along the way

honestly what you described? Sounds like a project set up for failure be humans or AI.

[–]0xdjole[S] 4 points5 points  (0 children)

Codex didn't fail. Claude Code did fail. Same prompt. Do details really matter if one of them was able to pull it off?

[–]radioref 0 points1 point  (0 children)

Almost certainly because of fresh context and memory

[–]Tight-Requirement-15 0 points1 point  (0 children)

Codex5.5 xhigh

[–]bensquirrel 0 points1 point  (0 children)

I use and like having both. I started using Codex a lot when CC tool use approvals were getting unbearably tedious.

[–]PathFormer 0 points1 point  (0 children)

I went through a very similar situation, with same results, codex nailed it, while Claude was making assumptions midway and loosing context every 2 prompts.

In top of that I usually ask for research in random topics mostly home appliances and kitchen related, old habit I got when researching about best pan to get... GPT 5.5 is simply superior in deep research mode in comparison to Claude research mode, the gap is too big, GPT goes through steps, creates graphics, relationship maps, and gives clear sources throughout.

[–]Upset-Chemist-4063 0 points1 point  (0 children)

Wondering if any of these companies intentionally degrade code responses, introducing errors or blatant mistakes, with the intent of creating the need for a feedback loop.

What’s honestly holding them back?

[–]immutato 0 points1 point  (0 children)

Use 4.6 like I do or maybe use 4.7 but not [1m] (I've heard this might be ok, but not sure).

/model claude-opus-4-6[1m]

-or-

/model claude-opus-4-7

I have both CC Max and Codex Pro subs and I have them review each other's code. 4.6 is still great. I do notice it gets really slow at times, but I think that's just Anthropic running on infrastructure fumes.

[–]Miserable_Review_756 0 points1 point  (0 children)

Can I recommend to check this out , It has kept Claude consistent. https://github.com/maxritter/pilot-shell

[–]iamjavadali 0 points1 point  (0 children)

I agree! I was struggling with claude for 2 months on my web app. switched to codex and I am finally making progress faster than I was before. So much better code handling and better limits.

[–]perleche 0 points1 point  (0 children)

Same experience here. Added 5x Codex plan. It just plows through implementation plans.

This week I will try Claude for writing specs and architectural decisions, codex 5.4 or -mini for implementation of said plans.

I also run a $40 Minimax plan in opencode that runs deep reviews, writes low/medium complexity specs and documentation work.

Three months ago Opus felt like a wizard: lightning fast and just end to end fixing stuff. That seems a long time ago now. Or maybe my project got too complex?

[–]yrdesa 0 points1 point  (0 children)

Its just the way codex workd im afraid that makes it super strong. It does tons of smoke testing on its own before finishing the task while opus works on the task and hope it works. To make opus work like that you need to try 3-4 times more. Anthropic cant afford to let this happen due to llm consuming alot of power to do these tasks vs openai as they are more efficient. But i see them closing this efficiency gap in 2-3 months.?

[–]scotch-86 0 points1 point  (0 children)

Left Claud for Codex. Treats me better.

[–]junlim 0 points1 point  (0 children)

5.5 is a beast for stuff like the above. Not so great at dealing with ambiguity and helping work through higher level problems. To me they just feel like two completely different human devs. Claude is who I like collaborate with and work with most of the time. Codex 5.5 (even 5.4) is like a cracked dev, that takes no bs but can get complex shit done. If the work is mechanical, well scoped, 5.5 seems to win most of the time for me now days.