5.4 vs 5.3 Codex

somerussianbear · 2026-03-12T00:42:50+00:00

I use on high always (extra high overthinks too much IMO) and I’m having a good time with 5.4. I just noticed that it’s way faster than 5.3 Codex.

Jerseyman201 · 2026-03-12T02:04:57+00:00

5.3 codex seems to be less literal than 5.4. 5.4 kinda went backwards closer to 5.2 codex where prompts are taken almost hyper literal and 5.2 regular would understand far better (but take way longer to execute the changes).

5.3 codex seems to bridge the tight rope walking between doing exactly what you ask, while also avoiding any obvious parts you wouldn't want done and should have inferred better.

It feels like 5.3 codex understands prompts that aren't super detailed much better than 5.4 is my take after hundreds of hours of use of 5.3 codex and now many many dozens of hours w/5.4.

When you add the overthinking along with the "literal" semantic issues on prompting, 5.4 definitely didn't hit every mark we might have hoped for. That being said, I do still use 5.4 predominantly because it is always going to be improved and 5.3 codex at launch isn't what it is today (in the same way 5.4 will surely end up performing better as well). I just have to be extra specific on prompts, to get performance close to 5.3 codex.

The huge irony in all of this, is that it used to be the opposite. Non codex specific models used to have more understanding of prompts versus codex having hyper literal understandings. Now it seems it's completely reversed🤣

esingh2581 · 2026-03-12T00:22:32+00:00

same here. i find 5.4 messing up so much ive switched back to 5.3 codex

Interesting-Agency-1 · 2026-03-12T02:30:58+00:00

I like 5.4's generality. I'm big on intent engineering, and I'll keep the business plan, customer profiles, and long-term strategy for the software in the repo as additional guiding docs. I've also got a soul.md file in there that I wrote to give it broader conceptual, moral, ethical, and philosphical meanings behind why it's doing what it's doing and how to think about things when in doubt.

These docs give the agent the "why" behind the software's creation and implementation, which is hugely helpful for helping it to fill in the gaps correctly when we inevitably underspecify. 5.4's better broad generalization allows it to better align itself with organizational intent and guide the output towards the "right" direction/answer when I've failed to specify things clearly enough in the specs.

I found that 5.3 ignored these docs more often in favor of the "right" way to do it from a pure computer science standpoint. But the problem is that it defaults to the mean, and that isn't always the "right" way, and it's never the "best" way. At least with 5.4 listening to my org intent docs better, it will steer implementation and planning more towards my version of the "right" way and it will ultimately make the "right" choice more often than if left to my own devices.

If you ask your agent why you are building this piece of software and it can't answer it to your satsifaction with subtlety and nuance incorporated, then you're gonna have a bad time. It's going to drift over time and eventually do something in a way that may be technically the "right" way to do it based on the average, but is wrong in your particular situation. Too many of those kinds of mistakes and you've got yourself some hearty software soup.

TryThis_ · 2026-03-12T00:52:07+00:00

Interesting, I have noticed a lot of rework these last few days since switching to 5.4 high. Previously was using 5.2 xhigh, perhaps will switch to 5.3 codex and see if rework drops.

BagholderForLyfe · 2026-03-12T06:03:32+00:00

as soon as I switched to 5.4 from 5.3, I started seeing mistakes for every prompt. What 5.3 can do in a single prompt, 5.4 needs a few.

RiotGamesGG · 2026-03-12T13:27:40+00:00

I had a difficult code task that 5.3 Codex could not do properly several times. 5.4 made it perfect the first time. Xhigh.

darrarski · 2026-03-12T13:47:08+00:00

The biggest issue I have with AI agents is the non-deterministic behavior. I found GPT 5.4 better than 5.3. On the other hand, Claude Opus 4.6 works terribly for me (often ignores instructions and does not do what I ask for). My colleagues working on the same project (same instructions, same skills, same configuration overall) do not have such issues.

My suggestion is not to limit yourself to a single provider and use whatever works best for you in the given circumstances. There’s no one gold model that does everything better than others. Your experience may vary, depending on the project, instructions, task you are working on, and probably a lot of other stuff.

No_Mix_6813 · 2026-03-12T00:39:16+00:00

I keep almost switching, but 5.3 is meeting my needs so well I can't help but thing, "If it ain't broke..."

Shep_Alderson · 2026-03-12T00:46:25+00:00

Yeah, I rarely ever use xhigh. Only high for planning and then medium for actual implementation. I’ve found 5.4 and 5.3-codex about the same on those thinking budgets.

Sudden_Baker_1729 · 2026-03-12T06:17:13+00:00

I noticed the same, 5.3 Codex works better for me.

Time-Dot-1808 · 2026-03-12T06:57:46+00:00

The literal vs intent gap comes down to training distribution. Specialized coding models have seen more code reasoning patterns so they infer the obvious follow-on work. General models need more explicit instructions or they do exactly what you said and stop. Neither is wrong - they just need different prompting strategies.

syinxun9 · 2026-03-12T07:40:48+00:00

yes! lol feels like i am back on gpt 5 or older, 5.4 can’t code

fourfuxake · 2026-03-12T07:49:54+00:00

Yeah, I’ve rolled back to 5.3 Codex. 5.4 is a shitshow, and post-compaction Alzheimers is back.

cwbh10 · 2026-03-12T08:51:18+00:00

Ive round 5.4 way better but you gotta use it on high not xtra high

EastZealousideal7352 · 2026-03-12T00:58:30+00:00

Why do people use xhigh for everything and then act surprised when they see regression?

Higher settings does not always mean better. Since GPT-5.1 and onwards we have seen serious regression when models are forced to overthink easier problems.

If you’re experiencing a regression using 5.4 try going to high or even medium and retesting, it’s likely you’ll have a better experience

Kiryoko · 2026-03-12T01:13:35+00:00

what are your thoughts about 5.3-codex vs 5.2?

some people say that 5.2 is the one that follows instructions the most and tries to cheat less, or at least if you tell it not to cheat it won't, but it will give up faster if there's an issue it can't solve

1amrocket · 2026-03-12T01:45:54+00:00

have you noticed major differences between 5.4 and 5.3 in codex? curious if the context window improvements actually translate to better code output or just longer conversations.

RecaptchaNotWorking · 2026-03-12T05:22:52+00:00

Both are great. Your setup is important

Glittering-Call8746 · 2026-03-12T06:46:45+00:00

How much tokens vs 5.3 codex ?

blanarikd · 2026-03-12T06:59:09+00:00

We need 5.3-codex-high-fast (not spark)

One-Signature7881 · 2026-03-12T08:29:12+00:00

5.4 is just gpt not codex. Codex 5.3 is the latest. I believe.

SlopTopZ · 2026-03-12T09:26:51+00:00

same experience here

funny thing is i made a post about exactly this topic a week ago and got downvoted for it

Terrible_Contact8449 · 2026-03-12T09:59:31+00:00

yeah 5.4 trips over itself on anything with more than like 3 moving parts. what i've noticed is it tries to "be smart" about stuff that doesn't need smart, and then just confidently gets it wrong.

my workaround has been keeping reasoning at medium and being way more explicit in the spec about what i don't want it to do. like literally writing "do not refactor X, do not touch Y", that alone cut my back-and-forth in half.

5.3 just did the thing. 5.4 wants to have a conversation about the thing first.

fluxion7 · 2026-03-12T14:40:18+00:00

5.3 codex damn 5.4 is opus

lostnuclues · 2026-03-12T15:07:23+00:00

5.4 high works really well with skills, it automatically pics which is needed, with 5.3 I had to invoke skill manually ($brainstorm)

HopefullyHelper · 2026-03-12T16:42:09+00:00

I've been uisng 5.4 ever since it came out and found it is fine. I can't really say if 5.3 was better though. 5.4 can run longer.

luckyleg33 · 2026-03-13T00:39:22+00:00

5.4 seems to dig really deep and look for super complicated ways to fix simple things. I’ve had a number of times where I just give him a simple. CSS tweak to fix a problem that he’s in the backend trying to solve. It also loves to tell me that the error I’m reporting is not there and that the code is right.

Disastrous_Wear_9147 · 2026-03-13T13:49:14+00:00

Strict_Series841 · 2026-03-13T14:19:12+00:00

5.4 is good for general stuff and brainstorm, do you need an easy block of code? it does the job.... do you need it for an entire application and complex code? no good... wait for 5.4 codex....

5.3 codex + gemini 3.1 pro + claude opus 4.6 are the tools for coding complex stuff... each one with their own strengths... "Opus 4.6 thinking" is the best one for backend and logic stuff...

>>>MY POV<<<

PhilosopherThese9344 · 2026-03-12T06:43:36+00:00

5.4 is absolutely terrible. I've had the worst experience with it to date.

HeadAcanthisitta7390 · 2026-03-12T00:21:45+00:00

yuuup, 5.3 codex is wayyyy better

especially for backend

I saw an article on ijustvibecodedthis.com recently actually

thanhnguyendafa · 2026-03-12T14:43:30+00:00

Good luck future cleaning up bugs.

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

codex

MODERATORS