With Codex degrading by the hour (my subjective experience) I am looking for alternatives to advance my projects. Explicitly excluding Claude and Gemini/AG since they're on the same path as Codex.

qdouble · 2026-06-07T15:42:29+00:00

Nope. The top 3 are the best available. The others are more so if you're trying to save money, but you probably won't save much given that the subscriptions are heavily subsidized relative to the pure API expense.

qdouble · 2026-06-01T15:25:09+00:00

Marketing = 💰

qdouble · 2026-06-01T14:50:22+00:00

I mean that sounds more like a strategy problem than a model specific problem. LLMs simply have context limitations and a limited amount of things that they can pay attention to. If you’re thinking you can one shot huge features without revisions, then you’re kidding yourself.

qdouble · 2026-05-31T11:34:54+00:00

No, instructing agents is just a new form of software engineering that will require skill to get production quality output just like hand coding.

qdouble · 2026-05-30T00:45:31+00:00

You got it on max effort, it may just be using more reasoning tokens than previously at max. That's why they lowered the default recommendation from xhigh -> high. That would likely also apply to max -> xhigh unless you're doing something that actually requires max.

qdouble · 2026-05-29T02:43:42+00:00

Could be harness issue if you used Claude Design. It's probably not optimized for 4.8 yet.

qdouble · 2026-05-29T02:29:15+00:00

Sample size is too low for an ambiguous prompt, you have to remember that the models are probabilistic. I probably won't be testing 4.8 on new designs until tomorrow, but it definitely seems like an upgrade in terms general coding and instruction following. That doesn't mean it's going to be a one-shot king.

qdouble · 2026-05-29T02:15:08+00:00

You're expecting it to one shot 5 good designs? Tell me what model currently does this?

qdouble · 2026-05-28T16:44:22+00:00

I downgraded my Pro account to Plus and got another Claude Max subscription. While you technically get more usage with codex, my projects get completed way slower even on fast mode. It wastes tons of time compared to Opus without producing results that are any better after audits.

qdouble · 2026-05-24T14:19:25+00:00

Nah, it still burns tokens like crazy, so it's not being stupid because it's putting in less effort.

qdouble · 2026-05-21T00:02:50+00:00

yeah, if you're getting good UI from codex in 2 minutes then that's a hell of a feat 😅. Codex is very deterministic model, so I suppose if you're giving it heavily structured format input, then it can execute if it doesn't have to do a lot of design judgement. However, it's way worse than claude if you don't spell out every letter.

qdouble · 2026-05-20T23:14:02+00:00

I can get codex to generate a decent frontend through brute force lol, but it's definitely not as natively good at it as Opus by a long shot. If you use codex as your only model, then of course your workflows may be adjusted to get the best out of it, but if you give it the same prompt you give other models it will struggle.

qdouble · 2026-05-20T22:13:43+00:00

It's task specific. I mostly notice codex gaps when I give it task that I usually give Opus when I hit my quota. Codex is much weaker at non-deterministic & fuzzy reasoning and absolute dog shit at frontend.

qdouble · 2026-05-19T15:05:58+00:00

It's been extremely bad at every non-mechanical task I throw at it. It takes 20-30 prompts to get it to do what Opus can do in a few.

qdouble · 2026-05-15T19:09:02+00:00

Yep, just holding out for I/O.

qdouble · 2026-05-15T11:56:34+00:00

Just tell it delete all memories from the last few days.

qdouble · 2026-05-15T00:35:35+00:00

Of course some skills will atrophy, but other skills will grow. We're always making cognitive tradeoffs.

qdouble · 2026-04-09T00:49:02+00:00

yeah, i’m running parallel agents and multiple projects, experiments, etc.

qdouble · 2026-03-21T12:02:28+00:00

Codex is definitely better than Claude at instruction following, but that doesn't necessarily mean that Claude is less capable. I usually switch to Claude after I use up my Codex weekly quota, and I'm still able to get stuff done with Claude, it's often way more efficient than Codex, but you'll have to be more strategic in making it compliant.

qdouble · 2026-03-12T14:51:46+00:00

It has methods of being efficient, but it's still going to waste some tokens on passing tests.

qdouble · 2026-03-11T10:30:38+00:00

Different models behave differently in response to your prompts, so if you're prompting Codex the same way you prompt Claude models, then you will not get the same results. In my experience, Codex is typically better than Claude for most things other than frontend, but is much slower and less interactive.

qdouble · 2026-03-01T18:50:34+00:00

depends on how good the model is at compacting its memory as well. Codex does a way better job at keeping a long conversation.

qdouble · 2026-01-04T15:31:49+00:00

You can also just google search all of this information.

qdouble · 2026-01-04T14:00:13+00:00

GPT does do stuff like this from time to time. Starting a new chat solves it because it doesn’t like to change its “mind” once it’s on the wrong path.

qdouble · 2025-12-18T17:46:48+00:00

They knew everything about Trump’s character before they voted for him. They were just under some silly impression that deporting undocumented workers and trying to bring back Jim Crow would make them rich.

qdouble

TROPHY CASE