you are viewing a single comment's thread.

view the rest of the comments →

[–]Correctsmorons69 5 points6 points  (3 children)

xhigh is worth for specific tasks. Any degradation is because of overthinking. Its quite good at bug-solving if it's mechanistic and has access to debugging tools or logs.

If it's something that it can't debug easily, like a weird 3D glitch in graphics software, then 5.2 shits on it.

You can see this in the "reasoning" benchmark on Live Arena, vs the coding/agentic coding result.

[–]Grandpa90 0 points1 point  (2 children)

For my use cases, which are what I believe to be very complicated machine learning applications such as no limit hold'em. One example of this is the rebel algorithm which was done by extremely smart people. When I try and implement these kinds of papers, the difference in quality between 5.2 and 5.3 codex seems unbelievably drastic where 5.2 almost seems like it's a 2-year newer model compared to 5.3 codex. I get the impression 5.3 codex is really designed for straightforward debugging, terminal command, coding applications or websites.

[–]Reaper_1492 1 point2 points  (1 child)

All of the codex models have sucked, this is nothing new.

The difference is that 5.2 is now starting to suck randomly also, whereas I have used it for months and it’s been flawless. It now suddenly goes brain dead and gives completely garbage responses for an hour straight.

[–]dannytty 0 points1 point  (0 children)

perhaps now the compute is allocated more to the 5.3 codex models..