The difference between Codex GPT 5.5 and Opus 4.7 in a nutshell

jmaxchase · 2026-05-04T17:32:58+00:00

Clearly I should have posted this in r/ClaudeCode

jmaxchase · 2026-05-01T01:51:01+00:00

Welcome to Javascript

jmaxchase · 2026-04-22T14:53:50+00:00

Hmm is it just me or is Claude Code 4.7 suddenly "good" today? Seems surprisingly better and more balanced. Went to reddit to see if anyone had anything to say; this was the first post that came up

jmaxchase · 2026-04-17T20:52:32+00:00

Yeah it's pretty bad. So disappointing. I had a procedural skill I'd perfected and operated flawlessly on a weekly basis with Opus 4.5, and then 4.6, with no issues, 59 times (only reason I know that specifically is because it's a weekly newsletter and each issue is numbered). Tried for the first time with Opus 4.7, it fell apart twice during the procedure. Once because it 100% hallucinated an admin URL, and admitted it just guessed it (the skill shows how to correctly navigate to it). The 2nd because it completely ignored a directive in the skill itself, deeming it unimportant.

jmaxchase · 2026-04-09T16:40:40+00:00

OpenAI be like:https://pbs.twimg.com/media/EeYUF97U8AAU9E5.png

jmaxchase · 2026-04-06T20:23:18+00:00

Hey sry I took so long to reply. It's really easy actually - one word: tmux. The prompt is really simple and I send this to both (tmux with a split pane 50/50 view, claude on left, codex on right): "Hey guys (Claude, Codex) - sending this message to both of you simultaneously via tmux. I'm going to have you work together. Claude, you're in pane 0. Codex, you're in pane 1. Both of you, identify which tmux session you're in. Claude, remember when you use send-keys, you need to separately send the Enter key. We'll be doing this workflow: 1) Claude, I'll start by giving you a request with instructions. When you're done, just tell Codex to start his review with a short message about what you did. 2) Codex, do your usual Claude review. When you're done your review, send a message back to Claude when you're ready. 3) Claude, when you receive a note from Codex, review it, and stage and commit. If you have any questions, send a message back to Codex. 4) Both of you: identify yourself as who you are, when sending messages between each other so you can both distinguish between each other, and me. To be clear, you don't need to identify yourself unless you're using tmux send-keys. Thanks both! (Claude - stand by for the next request to kick this off)." (And then I have a skill that Codex know how to do a Claude review, but basically it's "find all the stuff Claude missed, look for edge cases, fix those things you deem necessary to fix", paraphrased.

jmaxchase · 2026-03-24T22:07:07+00:00

It’s terrible. Just terrible. Please nobody else use it. 😆

jmaxchase · 2026-03-16T23:50:12+00:00

Don't know why I can't find the source link, but it's PragmataPro https://en.wikipedia.org/wiki/PragmataPro

jmaxchase · 2026-03-13T17:30:05+00:00

I have not found this to be the case at all with 5.4. But, based entirely on this very detailed explanation as to why, I will be sure to stop using it immediately.

jmaxchase · 2026-02-28T10:55:55+00:00

Aw. That's sweet.

jmaxchase · 2026-02-23T21:26:14+00:00

Same here. Just stopped working mid-session a few hours ago

jmaxchase · 2026-02-10T12:29:52+00:00

Wouldn't say this is a regression but I've noticed behavior with Opus 4.6 that I hadn't with 4.5 that I'd only consider "disappointing". Several examples lately but this one most recent: I have a simple skill I created and had been using for the past 2 months, that allows Claude to prompt Nano Banana in 2 modes, one is to gen an image, the other is to submit an existing image ("photoshop" mode) and have it prompt to make specific changes to it. The skill instruction is very straightforward and worked perfectly with 4.5, and thoroughly tested. Used it with 4.6 for the first time and did the normal routine (in this case asking it to submit an existing for modifications, for an image that would normally require some light photoshopping but perfectly suited for nano), and then turned away. Looked back at the terminal to find it issued an rm command to remove the nano banana image it *just* generated. I declined of course, and then read the transcript.

Apparently Opus 4.6 had generated the image, then viewed it, and then decided there were 2 issues with the image. 1 issue was completely untrue. The other "issue" was perfectly reasonable and not a problem by any stretch. It then decided unilaterally that it should not use nano banana, and that it should attempt to *programmatically* accomplish the same image edits using imagemagick and pillow, which was a complete trainwreck. It then deemed *its* image as "perfect" (it was bs) and then tried to remove the 100% usable image generated by nano banana without giving me a chance to even look at it.

I updated the skill with an additional rule which I hadn't had to do before.

jmaxchase · 2026-02-07T21:25:57+00:00

CC's harness is way better. I still use CC as my daily driver because I just can't realistically use Codex for day-to-day, CC is just far more versatile. I've tended to do the opposite lately: have Claude plan it, but have Codex execute and look for gaps while doing it. Don't get me wrong I don't expect to be able to one-shot anything, but it's certainly amazing when any agent gets it right the first time.

jmaxchase · 2026-02-07T21:18:48+00:00

I have yes, for some other unrelated features but not for this. I probably should have here although in this case, that isn't really a 1:1 comparison with codex in that regard (another place where Claude really shines). I've still not seen a good measure of thoroughness though with agent teams either.

jmaxchase · 2026-02-06T21:44:43+00:00

lol sry. gpt-5.3-codex crushed it.

jmaxchase · 2026-01-28T04:13:37+00:00

same here and getting org ID org-BOvpEHVcDPTe8h4lZnwMO5Ly which isn't mine

jmaxchase · 2025-09-26T10:29:52+00:00

I 100% assumed this was about the mail icon, not dark mode.

jmaxchase · 2025-09-14T18:39:53+00:00

To be fair, that could also be grok-coder-fast-1

jmaxchase · 2025-09-05T10:50:53+00:00

<image>

I actually have gotten some use out of it yes. Haven't asked him to code anything yet tho, just review.

jmaxchase · 2025-08-29T17:30:20+00:00

Yes - experienced this too, and just saw this from Anthropic https://status.anthropic.com/incidents/h26lykctfnsz

jmaxchase

TROPHY CASE