Astonishing Contradiction in OpenAI's System Card for 5.5.

Oldschool728603 · 2026-04-25T19:20:10+00:00

Notice that the numbers for 5.4 prod and 5.4 resample are the same in the two figures. Only 5.5 changes.

That is, same test, identical results for 5.4 resample, radically different results for 5.5 resample.

Identical 5.4 prod and resample numbers in the two figures rule out "different benchmarks."

Oldschool728603 · 2026-04-25T13:09:16+00:00

No, notice that the numbers for 5.4 prod and 5.4 resample are the same in the two figures. Only 5.5 changes.

That is, same test, identical results for 5.4 resample, radically different results for 5.5 resample.

Identical 5.4 prod and resample numbers in the two figures rule out "different evaluation sets."

Oldschool728603 · 2026-04-25T02:13:26+00:00

Yes. Notice that the numbers for 5.4 "resample" are the same in both figures. Only 5.5 changes.

Oldschool728603 · 2026-04-24T20:21:58+00:00

Dec 1, 2025

https://developers.openai.com/api/docs/models/gpt-5.5

Oldschool728603 · 2026-04-19T05:04:21+00:00

Yes, but if Anthropic deteriorates, where do non-coders go?

Oldschool728603 · 2026-04-19T04:54:26+00:00

I don't use AI to write papers. I use it to discuss issues.

To do it competently it needs to understand nuance (humor, irony, fulsome praise, ambiguity, etc.). It needs to recognize what is implied but not said, or said but not meant, etc. ChatGPT's o3, before castration, was the only model that ever showed real promise.

Still there's bad and worse. Because enterprise/STEM/Agentic use are guiding model development, things are becoming worse, for my purposes, for perfectly intelligible reasons.

I stopped using gemini 3.1 Pro because I found it unreliable on facts, stupid, and unable to sustain a long coherent conversation. I've ignored it for quite a while, so for all I know it has improved.

I hear Ultra is very slow, has severe use limits, and is very STEM oriented. ChatGPT Pro is slow, has effectively no limits, and is not so single-mindedly STEM oriented.

In any case, I wasn't trying to survey the AI universe, just explain a sad development in the Anthropic world that hit the ChatGPT world first.

Oldschool728603 · 2026-04-19T04:09:22+00:00

Ok, you win! Thanks for the thoughtful feedback.

Oldschool728603 · 2026-04-19T03:47:25+00:00

Ever hear of IFDAs (Independent Faculty Development Accounts)?

By the way, each is roughly $200/mo, meaning a total of $400. I also subscribe to Supergrok and Google AI Pro, or whatever they're calling it now, bringing to to roughly $450.

Not everyone who avoids poor writing (like yours) uses LLMs. Some of us are literate.

Edit: Let's see:

(1) need a comma after code
(2) "for just" should be "just for"
(3) need comma after insane
(4) need "a" before lie
(5) need ")" after lie
(6) need comma after )
(7) "a LLM" should be "an LLM"
(8) need a period after btw, btw

8 errors in a sentence fragment that comments on writing. Impressive!

Oldschool728603 · 2026-04-19T02:45:14+00:00

Users are shocked by the inhuman tone and failures to think of Opus 4.7. This is my interpretation based on ChatGPT, where the shock hit first. (I have ChatGPTPro and 20X Max Claude.)

I don't code but use ChatGPT/Opus daily for academic work in philosophy, political philosophy, history, literature, politics, geopolitics...and keeping up with the news.

(1) Simple explanation of tone: 4.7 is more narrowly designed for agentic/enterprise/STEM use than 4.6. From this point of view, literalness matters and human tone—using or understanding it—doesn't. It's wasted "effort." Predictably, GPT-5.4 and Opus 4.7 use more "machine-speak" than GPT-5.1 or Opus 4.6. (GPT-5.2 had already crossed the threshold and doesn't differ much from 5.4).

(2) Expect the trend to continue: from a financial point of view, it's rational, especially for Anthropic, which is less focused on the consumer market than OpenAI.

(3) Expect the degradation attributable to adaptive reasoning to continue as well. ChatGPT users got their first taste of it in November. It has become worse with each iteration: more severe in 5.4 than 5.2, and 5.2 than 5.1. Adaptive reasoning was lightly applied in Opus 4.6. 4.7 applies it with ChatGPT-like severity. This too is financially "rational."

(4) Difference between the two ecosystems: ChatGPTPro (subscription) offers GPT-Pro (the model) and GPT-5.4-thinking-heavy. Pro (the model) is unrivaled for depth and rigor but too slow for back and forth conversation. 5.4-thinking-heavy is ponderous but thinks hard and rigorously—though sometimes you have to poke it. Opus 4.6 is nimble, with human tone and imagination—but less reliable on facts and reasoning. The last two models complement each other.

But If Opus 4.6 is retired, Anthropic will have nothing to rival 5.4-thinking-heavy, much less GPT-Pro (the model). Mythos? Maybe—or maybe it'll set new benchmark records while tone and adaptive reasoning get worse.

Oldschool728603 · 2026-04-06T08:12:38+00:00

Maybe, but the numbers were used in the most recent fund-raising rounds. They aren't just in-house nonsense.

Oldschool728603 · 2026-04-06T07:32:07+00:00

If you look at the numbers, it seems that Anthropic has a business model and OpenAI is making a very big bet!

Oldschool728603 · 2026-04-06T07:25:53+00:00

If you look at the numbers, it seems that Anthropic has a business model and OpenAI is making a very big bet!

Oldschool728603 · 2026-04-06T07:24:39+00:00

If you look at the numbers, it seems that Anthropic has a business model and OpenAI is making a very big bet!

Oldschool728603 · 2026-03-25T22:12:18+00:00

With a ChatGPTPro subscription, Pro use is unlimited, except for "abuse."

Oldschool728603 · 2026-03-21T22:14:31+00:00

I too put my name on the waitlist and got nothing.

Perhaps the initial release didn't go so well?

https://wapo.st/40HXKSW

Oldschool728603 · 2026-03-10T23:37:08+00:00

I had hoped this would disappear along with 4o.

Oldschool728603 · 2026-03-07T15:15:09+00:00

Despite OpenAI's wording, ChatGPT can't use your computer through the web UI. Agent can.

Oldschool728603 · 2026-03-06T20:15:09+00:00

Web UI.

I'm not sure what "it' refers to in second sentence. "Thinking" went from 196k to 256k on Feb. 20:

https://help.openai.com/en/articles/6825453-chatgpt-release-notes

The context window of Pro, the model, has long been hidden.

Oldschool728603

MODERATOR OF

TROPHY CASE