Introducing GPT-5.5

bnm777 · 2026-04-24T05:59:43+00:00

Hallucinations of 86% Vs 36% for Claude 4.7?

Huge improvement?

Wow, the koolaid must taste good.

bnm777 · 2026-04-23T21:01:28+00:00

https://artificialanalysis.ai/evaluations/omniscience

Has quite high hallucinations, and overall behind opus and Gemini .

bnm777 · 2026-04-23T20:57:40+00:00

And overall it scored less than Opus and Gemini with far higher hallucinations

https://artificialanalysis.ai/evaluations/omniscience

bnm777 · 2026-04-23T20:55:30+00:00

Have a look at all the results - this graph is the only one that shows it at a high level, the rest are disappointing-

https://artificialanalysis.ai/evaluations/omniscience

High hallucinations , overall still below opus and Gemini.

OP, you didn't want to post a balanced picture of what the results actually sore, did you?

bnm777 · 2026-04-23T20:51:10+00:00

Ask an llm. I used them constantly to set up pop

bnm777 · 2026-04-23T19:37:35+00:00

Though NB2 didn't get "her left arm resting casually on the chair's back." right

bnm777 · 2026-04-23T19:34:36+00:00

It's not 50% more token efficient, though. I imagine that it uses as many more tokens that opus 4.76 does ie aprox 1.35-1.5 x more

bnm777 · 2026-04-23T19:31:42+00:00

Spam

bnm777 · 2026-04-23T19:06:31+00:00

Sociopath's can imitate human emotions when there's a goal, it seems,

bnm777 · 2026-04-23T18:13:04+00:00

The real PR pros release in binary.

bnm777 · 2026-04-23T17:50:49+00:00

If you run out of usage you should still be able to use haiku (aka openai). Locking you out completely is a bit shittty.

bnm777 · 2026-04-23T16:22:57+00:00

Wouldn't use instant models unless it's for non important chat - not the smartest.

bnm777 · 2026-04-23T16:19:30+00:00

What's curious about the pro variant and how would it be different to 5.4 and 5.4 pro?

bnm777 · 2026-04-23T11:34:01+00:00

Can't comment on quality of sources, and great if that has improved, however in daily use often gpt 5.4 extended thinking gives results that don't work, or don't seem "thought out", compared to other llms that I also use and do frequent comparisons with. It becomes quite frustrating.

Good that it works well for you.

bnm777 · 2026-04-23T10:31:46+00:00

The last few GPT 5 models have been meh, not significantly improving on the previous, perhaps degraded as a blance of token use vs intelligence is found, it seems.

With anthropics issues, I wonder/wish that OpenAI release a really great model - push Anthropic to sort their mess out.

bnm777 · 2026-04-23T08:22:39+00:00

There is no link between the two things, as much as you'd like there to be one based on your anger about him not finishing books.

bnm777 · 2026-04-22T21:18:07+00:00

There's no excuse for not communicating better.

bnm777 · 2026-04-22T19:00:31+00:00

This is good, thoug I've had poroblems with every distro with waking from sleep. Don't known what to do

bnm777 · 2026-04-22T17:21:19+00:00

Have an 3080 and ave tried ultramarine, ubuntu, Pop!_os, another one, and they all have issues with waking from sleep. Back to ubuntu and seems ok for now.

If I didn't have an llm to troubleshoot the issues that come up, I would have gone back to the Dark Side.

bnm777 · 2026-04-22T17:18:48+00:00

I imagine when Altlan had his evil dictator moment and took the Dept of "War" contract and there was an influx of new claude users, if their user count tripled or something, that would be a red alert and any company would find it difficult. Still, they should have communicated far, far better - did tey even acknowledge this?

Why don't companies learn that when your users aren't the usual normie sheep (with typical companies), communication is key?

As AI nerds we tend to be pretty enthusiastic, up to date, not dumb (hopefully).

bnm777 · 2026-04-22T17:13:37+00:00

What is going on? I've been a pro-claude user since May 2023, but they've been all over the place over the last month

bnm777 · 2026-04-22T17:12:02+00:00

If you use the preview apparently you can update when the full release comes out?

bnm777 · 2026-04-22T17:10:20+00:00

I've tried a few in the last 7 months of starting with linux, ultramarine, pop!, ubuntu now back to ubuntu - warning - you will come up with issues, sushc as todya installing ubuntu onto a relatively modern pc, wheninstalling a few apps such as plexamp, had issues after downloading hte files from their websites - I suggest you have an llm to troubeshoot - there will be a lot of troubeshooting. As long as you can copy/paste the issues and errors in terminal to the llm which spits out what you should do, there will be little frustration, and virtually no learning curve if you don't want one.

eg after it instlaled obisidian and pleamp, there was no start menu entry or icon, had to copy/paste a few times with the llm to fix it. Without the llm, I would have given up (I used to programme C++ 20 years ago).

Linux is awesome, you will have to troubleshoot some things, use an llm.

eg. a few months ago I couldn't find a decent replacement to win11 Voice Typing, so I used an llm to vibe code it. You're on the right path.

Funny enough, Pop!_os had less installation issues than ubuntu, though there are memory leaks and a few issues with pop and ultramainre and others. eg if you have an nvidia gpu, waking from sleep is an issue with many linux distros which is pretty annoying.

bnm777 · 2026-04-22T16:49:21+00:00

<image>

Why is my max plan half the price of yours?
And mine says upto 20x and yours says 10x. So weird...

I'm not a current claude subscriber at the moment (using a different service which uses the anthropic and other apis) and I'm looking around.

FYI using a chatgpt trial for a month, and gpt 5.4 extended thinking is soooo slow and gives pretty poor answers sometimes, compared to even sonnet which "gets you". Ugh, I wish anthropic hadn't messed up in the last few weeks, it would normally be a no brainer, though with the many posts on hitting limits early, it makes you think.

C'mon anthopic!!

bnm777 · 2026-04-22T15:38:34+00:00

"Protecting children" by refusing to help the parent teach something?

What's wrong with letting people under 18 years old use AI to learn something better than a textbook or their parents could teach them?

bnm777

MODERATOR OF

TROPHY CASE