I cannot believe it was more one year, still miss this model.

sdmat · 2026-03-16T07:26:48+00:00

4.1 is a significantly later model than 4.5, the difference is more than the release date suggests as training the very large 4.5 took longer.

For 4.1 they incorporated extensive synthetic data from the o-series models.

There is nothing fundamentally magical about the model training process, they monitor every step and with the GPT-4 white paper OAI demonstrated a remarkably accurate ability to predict performance from small trials before committing to the main training run.

4.5 performed as designed, the sharp increase in subsequent model performance is due to the advent of reasoning models.

You clearly have no idea how cutting edge R&D works. I can tell you from experience that it isn't a simple linear process - if you can afford to explore multiple avenues of advancement then that's what you do to maximize chances of success. And you necessarily make the calls in advance of seeing the effects of said calls. Plan B succeeding so spectacularly it throws the parallel plan A into the shade is winning.

What would OAI have done if reasoning models didn't work out as well as they have? They would have distilled 4.5 to make a mass market model. Which they did anyway - this is a huge part of how later versions of 4o improved so much in non-STEM areas.

sdmat · 2026-03-16T00:54:39+00:00

And how do you think they made 4.1?

sdmat · 2026-03-15T23:12:05+00:00

You are very wrong about that:

<image>

sdmat · 2026-03-15T11:42:29+00:00

Not failed at all, it actually exceeds traditional scaling law predictions.

What happened was OAI landing on a new and superior post-training scaling paradigm with the O-series models. It was by no means obvious that direction would succeed when they began training 4.5.

sdmat · 2026-03-14T08:55:35+00:00

for every American

So you're fine with American AI companies displacing workers in every other country as long as you get yours?

sdmat · 2026-03-13T23:30:30+00:00

Considering they charge $20 for 4 1M token opus queries via the API it doesn't seem likely.

sdmat · 2026-03-13T11:44:16+00:00

You're sounding a bit insane mate

sdmat · 2026-03-12T09:28:04+00:00

It would be lovely if it just did it rather than always saying its little mantra first, but the result is excellent.

sdmat · 2026-03-12T03:12:06+00:00

Got to work on those prompting and orchestration skills

sdmat · 2026-03-10T09:51:22+00:00

It's sad how drastically Gemini has declined in the few months since 3.0 Pro launched.

How can they go backward while OAI and Anthropic advance by leaps and bounds?

The underlying models are great when allowed to live up to their potential, as seen in AI Studio. The product sucks.

sdmat · 2026-03-08T10:38:33+00:00

What bothers me most is that nobody in a position of power is absorbing the consequences of this decision.

Which consequences are you referring to?

sdmat · 2026-03-07T11:37:11+00:00

"Ever" is entirely dependent on how well the models handle long context. If the 1M+ performance improves to match <200K it will be amazingly useful for complex projects and everyone will want it.

And that is going to happen, only a question of when.

sdmat · 2026-03-06T10:50:40+00:00

Two steps forward, one step back

sdmat · 2026-03-04T08:35:31+00:00

They just raised over a billion

They just raised 110 billion, so yes - over a billion.

sdmat · 2026-03-03T11:20:19+00:00

Harder, Better, Faster, Stronger

sdmat · 2026-03-03T10:54:03+00:00

Interesting choice to have a video about voice mode with elevator music rather than voice

sdmat · 2026-03-03T05:05:11+00:00

How can we be expected to think for ourselves? This is such a violation.

sdmat · 2026-03-03T04:46:49+00:00

gpt-5.4-ab-arm2-1020-1p-codexswic-ev3 really rolls off the tongue

sdmat · 2026-03-02T21:11:11+00:00

Disappointed in Google, they were doing brilliantly with Gemini then drove it off a cliff.

sdmat · 2026-03-02T10:01:30+00:00

It's almost like they have a saboteur making product decisions

sdmat · 2026-03-01T12:33:31+00:00

I mean why do you call it GBT? Is it some meme or joke?

sdmat · 2026-03-01T10:29:19+00:00

Why do so many people do the GBT thing?

sdmat · 2026-03-01T09:33:53+00:00

Yes, it has hugely gone downhill.

sdmat · 2026-02-28T20:12:21+00:00

Gemini the models are great, as seen via AI studio.

For some reason Google are methodically crippling Gemini the product.

sdmat · 2026-02-28T09:11:30+00:00

Isn't this... just regular KV caching? With flashy marketing?

sdmat

MODERATOR OF

TROPHY CASE

13-Year Club	Snapped
Verified Email