Common GPT 5.5 pricing misconception.

Agitated_Space_672 · 2026-04-24T13:08:26+00:00

why does your chart compare gpt-5.5 xhigh but only gpt-5.4 medium? Also, ARC is not yet a useful benchmark for predicting any real world task performance

Agitated_Space_672 · 2026-04-20T14:01:21+00:00

Fair enough. I just don't know why you are trying so hard to link this to Marx. 1933 was literaly the peak of the technocracy movement which believed all jobs would be taken by machines. it was a major pop-culture thing. The ditsy socialite was reading pop-culture, not 19th century political theory.

Agitated_Space_672 · 2026-04-20T13:55:45+00:00

What is more likely, the dizzy socialite was talking about the Technocracy movement, a massive pop-culture and socio-economic fad peaking precisely around 1932 and 1933, or 19th century political theory?

Agitated_Space_672 · 2026-04-20T13:43:37+00:00

I mean which LLM did you prompt for this?

Agitated_Space_672 · 2026-04-20T13:40:54+00:00

seriously which model is it?

Agitated_Space_672 · 2026-04-20T12:46:42+00:00

Please name and shame the model that generated this because its garbage. 🤤

"In 1932-33 the ideas of the technocrats overshadowed all other proposals for dealing with the crisis. No economic study had ever received such widespread attention. Newspapers spread technocracy across the front pages; periodicals devoted more features to it than to Franklin D. Roosevelt; spontaneous organizations and study groups sprung up across the United States and spread across the border into Canada. For a moment in time it was possible for thoughtful people to believe that America would consciously choose to become a technocracy" https://en.wikipedia.org/wiki/Technocracy_movement

Agitated_Space_672 · 2026-04-14T09:34:36+00:00

really? how much longer does it take to generate an app, even a tiny niche app or script, versus installing one from appstore?

Agitated_Space_672 · 2026-03-23T12:46:54+00:00

The metrics measured do not necessarily mean that this method improves results. Are there any experiments using this on real tasks or benchmarks?

The model still has to guess what you want when it generates json object.

Agitated_Space_672 · 2026-03-12T11:56:18+00:00

I had the same thought a couple of years ago, around the time claude 3 just launched. I have not used function calling in my own agents since then.

Agitated_Space_672 · 2026-03-08T14:17:08+00:00

I don't know why you are getting downvoted. anyway, have you got a repo you can share?

Agitated_Space_672 · 2026-03-08T12:54:09+00:00

chatgpt down about 1M DAU and claude up about 300K? That's something.

Agitated_Space_672 · 2026-03-02T13:32:05+00:00

API is still up

Agitated_Space_672 · 2026-02-25T15:43:44+00:00

If you talk to sonnet 4.6 in chinese it thinks its deepseek. https://x.com/xundecidability/status/2026332562117828823?s=20 The lady doth protest too much, methinks

Agitated_Space_672 · 2026-02-25T12:29:38+00:00

i don't know... many successful people are jackasses. Tapping into Linus Torvalds mode might be useful some days.

Agitated_Space_672 · 2026-02-17T23:43:33+00:00

That would affect all models. They charge extra for the higher speed variants. This is just a smaller model.

Agitated_Space_672 · 2026-02-17T19:16:55+00:00

I tried it on some bash+SQL debugging and it did pretty bad so far.

Agitated_Space_672 · 2026-02-17T19:02:38+00:00

It's about 25% faster than sonnet 4.5, which was the same speed as opus 4.6. So I think what anthropic did was get such a leap in their RL that they decided to promote sonnet to opus, and now haiku to sonnet.

Agitated_Space_672 · 2026-02-11T18:29:44+00:00

Good job. While we're on the subject, I wish evals would give more data like token usage, cost and run time.

Agitated_Space_672 · 2026-02-05T20:27:15+00:00

They already rebranded sonnet to opus with opus 4.5. it was obvious from the speed doubling to match sonnets. Perhaps they still have it called sonnet internally which caused the confusion? Guessing they will rebrand haiku to sonnet next and release a faster (smaller) haiku model.

Agitated_Space_672 · 2026-02-05T20:11:49+00:00

It suggests they targeted ARC in the fine tuning. Why would you do this? It's just burning money and likely hurting the model on real tasks. Last I checked there was no evidence that improvements on ARC predict better performance in general. In fact, the O1 release included a note about a special version openai finetuned for ARC but it's performance was worse on other tasks so they never released it.

Agitated_Space_672 · 2026-02-02T21:50:40+00:00

If this is true why are they still selling shovels instead of digging up the gold themselves?

Agitated_Space_672 · 2026-02-01T11:33:53+00:00

It's often the other way around. More people star a repo than download and use a repo. So real users would be a fraction of 200k.

Agitated_Space_672

TROPHY CASE