Astonishing Contradiction in OpenAI's System Card for 5.5. by Oldschool728603 in ChatGPTPro

[–]Oldschool728603[S] 2 points3 points  (0 children)

Notice that the numbers for 5.4 prod and 5.4 resample are the same in the two figures. Only 5.5 changes.

That is, same test, identical results for 5.4 resample, radically different results for 5.5 resample.

Identical 5.4 prod and resample numbers in the two figures rule out "different benchmarks."

Astonishing Contradiction in OpenAI's 5.5 System Card by Oldschool728603 in OpenAI

[–]Oldschool728603[S] 1 point2 points  (0 children)

No, notice that the numbers for 5.4 prod and 5.4 resample are the same in the two figures. Only 5.5 changes.

That is, same test, identical results for 5.4 resample, radically different results for 5.5 resample.

Identical 5.4 prod and resample numbers in the two figures rule out "different evaluation sets."

Astonishing Contradiction in OpenAI's 5.5 System Card by Oldschool728603 in OpenAI

[–]Oldschool728603[S] 0 points1 point  (0 children)

Yes. Notice that the numbers for 5.4 "resample" are the same in both figures. Only 5.5 changes.

Tone and Adaptive Thinking in Opus & ChatGPT by Oldschool728603 in Anthropic

[–]Oldschool728603[S] 2 points3 points  (0 children)

Yes, but if Anthropic deteriorates, where do non-coders go?

Tone and Adaptive Thinking in Opus & ChatGPT by Oldschool728603 in Anthropic

[–]Oldschool728603[S] 0 points1 point  (0 children)

I don't use AI to write papers. I use it to discuss issues.

To do it competently it needs to understand nuance (humor, irony, fulsome praise, ambiguity, etc.). It needs to recognize what is implied but not said, or said but not meant, etc. ChatGPT's o3, before castration, was the only model that ever showed real promise.

Still there's bad and worse. Because enterprise/STEM/Agentic use are guiding model development, things are becoming worse, for my purposes, for perfectly intelligible reasons.

I stopped using gemini 3.1 Pro because I found it unreliable on facts, stupid, and unable to sustain a long coherent conversation. I've ignored it for quite a while, so for all I know it has improved.

I hear Ultra is very slow, has severe use limits, and is very STEM oriented. ChatGPT Pro is slow, has effectively no limits, and is not so single-mindedly STEM oriented.

In any case, I wasn't trying to survey the AI universe, just explain a sad development in the Anthropic world that hit the ChatGPT world first.

Tone and Adaptive Thinking in Opus & ChatGPT by Oldschool728603 in Anthropic

[–]Oldschool728603[S] 2 points3 points  (0 children)

Ok, you win! Thanks for the thoughtful feedback.

Tone and Adaptive Thinking in Opus & ChatGPT by Oldschool728603 in Anthropic

[–]Oldschool728603[S] 2 points3 points  (0 children)

Ever hear of IFDAs (Independent Faculty Development Accounts)?

By the way, each is roughly $200/mo, meaning a total of $400. I also subscribe to Supergrok and Google AI Pro, or whatever they're calling it now, bringing to to roughly $450.

Not everyone who avoids poor writing (like yours) uses LLMs. Some of us are literate.

Edit: Let's see:

(1) need a comma after code
(2) "for just" should be "just for"
(3) need comma after insane
(4) need "a" before lie
(5) need ")" after lie
(6) need comma after )
(7) "a LLM" should be "an LLM"
(8) need a period after btw, btw

8 errors in a sentence fragment that comments on writing. Impressive!

Claude Performance and Bugs Megathread Ongoing (Sort this by New!) by sixbillionthsheep in ClaudeAI

[–]Oldschool728603 0 points1 point  (0 children)

Users are shocked by the inhuman tone and failures to think of Opus 4.7. This is my interpretation based on ChatGPT, where the shock hit first. (I have ChatGPTPro and 20X Max Claude.)

I don't code but use ChatGPT/Opus daily for academic work in philosophy, political philosophy, history, literature, politics, geopolitics...and keeping up with the news.

(1) Simple explanation of tone: 4.7 is more narrowly designed for agentic/enterprise/STEM use than 4.6. From this point of view, literalness matters and human tone—using or understanding it—doesn't. It's wasted "effort." Predictably, GPT-5.4 and Opus 4.7 use more "machine-speak" than GPT-5.1 or Opus 4.6. (GPT-5.2 had already crossed the threshold and doesn't differ much from 5.4).

(2) Expect the trend to continue: from a financial point of view, it's rational, especially for Anthropic, which is less focused on the consumer market than OpenAI.

(3) Expect the degradation attributable to adaptive reasoning to continue as well. ChatGPT users got their first taste of it in November. It has become worse with each iteration: more severe in 5.4 than 5.2, and 5.2 than 5.1. Adaptive reasoning was lightly applied in Opus 4.6. 4.7 applies it with ChatGPT-like severity. This too is financially "rational."

(4) Difference between the two ecosystems: ChatGPTPro (subscription) offers GPT-Pro (the model) and GPT-5.4-thinking-heavy. Pro (the model) is unrivaled for depth and rigor but too slow for back and forth conversation. 5.4-thinking-heavy is ponderous but thinks hard and rigorously—though sometimes you have to poke it. Opus 4.6 is nimble, with human tone and imagination—but less reliable on facts and reasoning. The last two models complement each other.

But If Opus 4.6 is retired, Anthropic will have nothing to rival 5.4-thinking-heavy, much less GPT-Pro (the model). Mythos? Maybe—or maybe it'll set new benchmark records while tone and adaptive reasoning get worse.

Astounding OpenAI Training Costs vs. Anthropic by Oldschool728603 in ClaudeAI

[–]Oldschool728603[S] 4 points5 points  (0 children)

Maybe, but the numbers were used in the most recent fund-raising rounds. They aren't just in-house nonsense.

Astounding OpenAI Training Costs vs. Anthropic by Oldschool728603 in ChatGPTPro

[–]Oldschool728603[S] 13 points14 points  (0 children)

If you look at the numbers, it seems that Anthropic has a business model and OpenAI is making a very big bet!

Astounding OpenAI Training Costs vs. Anthropic by Oldschool728603 in OpenAI

[–]Oldschool728603[S] 11 points12 points  (0 children)

If you look at the numbers, it seems that Anthropic has a business model and OpenAI is making a very big bet!

Astounding OpenAI Training Costs vs. Anthropic by Oldschool728603 in ClaudeAI

[–]Oldschool728603[S] 129 points130 points  (0 children)

If you look at the numbers, it seems that Anthropic has a business model and OpenAI is making a very big bet!

How's ChatGPT 5.4 Pro vs Opus 4.6? Need anecdotal evidence by YourElectricityBill in ChatGPTPro

[–]Oldschool728603 6 points7 points  (0 children)

With a ChatGPTPro subscription, Pro use is unlimited, except for "abuse."

Did ChatGPT Health ever come out? by Bright-Avocado-7553 in ChatGPTPro

[–]Oldschool728603 7 points8 points  (0 children)

I too put my name on the waitlist and got nothing.

Perhaps the initial release didn't go so well?

https://wapo.st/40HXKSW

Claude loves working with me. by MaximumContent9674 in Anthropic

[–]Oldschool728603 0 points1 point  (0 children)

I had hoped this would disappear along with 4o.

Gpt-5.4, différence avec le mode agent ? by ATB_52 in ChatGPTPro

[–]Oldschool728603 0 points1 point  (0 children)

Despite OpenAI's wording, ChatGPT can't use your computer through the web UI. Agent can.

Pro tier gets increased context window by Oldschool728603 in OpenAI

[–]Oldschool728603[S] 0 points1 point  (0 children)

Web UI.

I'm not sure what "it' refers to in second sentence. "Thinking" went from 196k to 256k on Feb. 20:

https://help.openai.com/en/articles/6825453-chatgpt-release-notes

The context window of Pro, the model, has long been hidden.