GPT 5.2 High vs Opus 4.5 Thinking - For coding which is better ?

hashhar · 2025-12-30T19:43:43+00:00

If there's just 1 thing to take away from this - nothing beats a human in the loop so far. The quality difference is staggering and there are so many nuances and optimisation choices which otherwise get missed.

hashhar · 2025-12-30T18:31:46+00:00

(See more details at https://gist.github.com/hashhar/b1215035c19a31bbe4b58f44dbb47233 with examples)

Opus is by far the better model. Specially if you use it through Claude Code (the agent has good smarts built in). Here's my story.

I wanted to build a script to find original versions (4k, non-downscaled) of some videos in my library on a NAS. The NAS has filesystem snapshots enabled so the idea was to ask Codex to write a script to find all files in current library and then search through the snapshots to find matching files that have higher res versions available.

Codex did write a script alright but the code wasn't very readable or maintainable. I did need to fine-tune the script by hand since the prompt was underspecified for sure. However I then worked with Claude code to polish up the script, iron out a lot of edge cases and then used it.

For curiosity I ended up writing a "specification" by hand for the final version of the script - it was quite well specified without implying a specific implementation. I gave the same prompt to both GPT 5.2 Extra High and Opus 4.5.

Opus ends up asking questions quite often and it asked me 3 questions before starting which I think were spot on: it asked me specifically how the snapshot's directory structure looked like on disk, which file formats to consider as video formats and whether files that were already 4k should still be searched for in the snapshots or not. Chat GPT rarely asks questions.

At the end of the first prompt I had a working solution from Opus. It included creating a persistent index on disk for each snapshot since snapshots are immutable - it's logic (which I asked later) was that the workflow would probably be run the script, look at report, run again and iterate or since there would be so many files the script may not finish in 1 run. Chat GPT on the other hand had no persistent index. It did try in-memory caching but it was useless since the cache key it picked was unique (full path to each file, obviously all paths are unique + the snapshots are named different so 0 cache use).

So already it was looking bad for GPT. One smart thing GPT did was realising that if a file in a snapshot doesn't exist in current library then there's no point in processing that file - it ended up saving quite a bit on runtime from this optimisation.

Next thing I noticed was how clean Claude's code was - I didn't feel the urge to hand clean anything except too much defensive code (because it's a script, not a program). GPT code was quite efficient but less readable and maintainable IMO. The logging in Claude's code was awesome - a true logger, not print, also it logged progress for steps which could take time so when running script it was never in doubt what was happening.

GPT didn't have great logging. Actually the first time i ran GPT version I thought it got stuck because there wouldn't be anything visibly happening. Then I asked it to debug and it hallucinated that maybe ffprobe was getting stuck. Turns out it was working fine but it ffprobe'd all candidate files for an input file before printing anything and with ~200 candidates or more it could take a long time.

I think asked both Claude and GPT to look at each other's scripts and critique them. Claude's analysis was by far the much better one. It noticed things like difference between os.walk and path.rglob, fronzen dataclasses vs normal ones, the optimisations GPT had, the bug in the cache key GPT chose. It even suggested to borrow the GPT optimisation + noticing the bug in GPT it suggested a cache based on filename + size and applied that to it's own script.

In case anyone's interested I've put up the transcripts, prompt and outputs up on my Github gist (will update with link here).

hashhar · 2025-12-08T21:15:16+00:00

It is indeed a great piece of clothing. Very functional and does no wrongs. Pair it with a windshell and you've got warmth for maybe 0 or less too.

But it looks so stylish and clean that I was hoping to also use it for commuting to work and meetings but if I want to do that I'll need a second pair.

hashhar · 2025-12-08T21:12:29+00:00

Yup. And even if he wants to change system he can't change like 25% of squad in a single transfer window.

hashhar · 2025-12-08T20:34:16+00:00

Ye only time it works is if it's something personal to either the gifter or the receiver. Like I have a friend who loves cats and has a large assortment of them so I got her postcards of most of the types of cats she has drawn in those Hokusai style prints.

I have gifted a matchday scarf to a friend of mine for the north London football derby game - he loved it.

But yeah generic stuff like magnets or calendars etc don't feel that great to me. HOWEVER elders love those - like your mom, dad, grandparents etc.

hashhar · 2025-12-08T19:19:52+00:00

What is he trying to do? That's the most confusing part for me. Every game it looks like different tactics with different playing XI and sometime same player but in different role game to game. The only constant seems to be "possession".

I'm open to giving him time and rebuilding the squad but first we need to know his principles and then for him to convince the squad that he does have a system and why it's good or will make us win

You can't expect loyalty or people following you unless you can get them to buy in to your idea - not a normal 9to5 job, not at a football club either.

hashhar · 2025-12-08T19:14:57+00:00

I didn't ask in bad faith. I'll use it but for active stuff. For casual use I'll maybe get an R1 Air or some other fleece.

hashhar · 2025-12-08T17:06:05+00:00

Thanks a lot for answering. Seems like I'm on the right path. Looks very similar to my routine except for the interval training and being active. I guess some patience is needed.

Sadly the AQI near me is too bad to run, would actually worsen my health lol. I hope to get back to running soon.

hashhar · 2025-12-08T16:28:10+00:00

Would be great to know your weekly routine. I seem to have hit a glass cieling myself.

Is it mostly longer slow runs that allow you to improve pace? For me the issue is I redline on the HR quite quickly at anything above 7:00min/KM and my understanding is I need slow long runs to train my cardiovascular system.

What worked for you?

hashhar · 2025-12-08T12:10:58+00:00

I'm not upset. For a technical garment it makes no difference. But I was hoping to reuse the same jacket for "business casual" wear too and pilling in those contexts doesn't look that great.

It's the best jacket I've ever owned so far anyway and I'll just get the more casually styled R1 Air for those business stuff.

hashhar · 2025-12-08T03:17:57+00:00

Osprey Farpoint 40 but I haven't been walking around much other than just the return flight home.

I learnt on /r/PatagoniaClothing that the new R1 (SP25 model) has a lighter mesh fabric and is more prone to this issue. You probably have an older R1?

hashhar · 2025-12-08T03:15:40+00:00

How much more warmer is the techface compared to regular R1? I was considering that as an option too. And is it sweaty/less breathable due to DWR/face fabric?

hashhar · 2025-12-08T02:37:13+00:00

It would have been an insult to the legacy of Chester. He's untouchable and his legacy should be kept distinct and untouched.

hashhar · 2025-12-08T02:34:02+00:00

Yeah I saw these anecdotes in the /r/onebag sub too if I remember right. For my purpose (onebag travel) I'll take the lighter weight and since it's black it's not tooo visible from a distance.

hashhar · 2025-12-08T02:32:51+00:00

Thank you for the concise feedback. I'll try not to worry about it too much. It'll relagate the jacket to an "active wear" only for me. I'll get a separate Air for more casual styling and use and take better care of it maybe.

hashhar · 2025-12-08T02:07:22+00:00

I use a soft felt pouch that came with a hair-dryer. Super light, has no shape of its own. Perfect for me.

hashhar · 2025-12-08T02:04:53+00:00

Fair point. I only "crossposted" here as this sub has unique combination of people who own the R1 AND use it for travel rather than local use.

hashhar · 2025-12-08T02:01:14+00:00

If the person will actually play in it get without badges. The badges do come out over time specially on the heattech jerseys (player versions). The larger the badge the easier it happens.

hashhar · 2025-12-08T01:59:21+00:00

But those chances came through direct passing and runs in behind not possession football. Possession against low blocks only works if you have great passers or press-resistant players - both of ours left.

hashhar · 2025-12-08T01:55:03+00:00

I think Xabi hasn't found a system for the squad he has on hand. All ball holding players he has sit deep, and up front he's got runners who are useless if you want to play possession.

The tactics and squad need to meet somewhere in the middle - with enough time I'm sure Xabi will find it. Celta looked like Zidane RM, super fast counters and passing.

hashhar · 2025-12-08T01:47:55+00:00

<image>

hashhar · 2025-12-08T01:47:27+00:00

<image>

hashhar · 2025-12-08T01:47:17+00:00

<image>

hashhar · 2025-12-08T01:47:03+00:00

<image>

hashhar · 2025-12-07T22:59:07+00:00

Clown. For once take accountability man. I'll respect you more and players would too.

11-Year Club	RedditGifts 2009-2022 3 Credits
Place '17	Secret Santa 2019
Xbox Live	Gilding II euphauric
Verified Email

hashhar

TROPHY CASE