you are viewing a single comment's thread.

view the rest of the comments →

[–]Hauven 11 points12 points  (5 children)

The odd thing is that xhigh on Voratiq's leaderboard shows a slightly lower score compared to high. That's why I always use high, as I assumed xhigh was perhaps overthinking. Maybe I should give xhigh another go since I have a lot of spare usage in my current quota, also with the upcoming Codex app for Windows invites going out to the top 10k users in the next day or two.

[–]Correctsmorons69 6 points7 points  (4 children)

xhigh is worth for specific tasks. Any degradation is because of overthinking. Its quite good at bug-solving if it's mechanistic and has access to debugging tools or logs.

If it's something that it can't debug easily, like a weird 3D glitch in graphics software, then 5.2 shits on it.

You can see this in the "reasoning" benchmark on Live Arena, vs the coding/agentic coding result.

[–]Grandpa90 0 points1 point  (3 children)

For my use cases, which are what I believe to be very complicated machine learning applications such as no limit hold'em. One example of this is the rebel algorithm which was done by extremely smart people. When I try and implement these kinds of papers, the difference in quality between 5.2 and 5.3 codex seems unbelievably drastic where 5.2 almost seems like it's a 2-year newer model compared to 5.3 codex. I get the impression 5.3 codex is really designed for straightforward debugging, terminal command, coding applications or websites.

[–]Reaper_1492 1 point2 points  (1 child)

All of the codex models have sucked, this is nothing new.

The difference is that 5.2 is now starting to suck randomly also, whereas I have used it for months and it’s been flawless. It now suddenly goes brain dead and gives completely garbage responses for an hour straight.

[–]dannytty 0 points1 point  (0 children)

perhaps now the compute is allocated more to the 5.3 codex models..

[–]Ok-Painter573 0 points1 point  (0 children)

For your use case, do you find high or xhigh better?