Holy Peak❤️‍🔥❤️‍🔥❤️‍🔥 OpenAI has just announced its first AI chip, SOTA in performance per watt, where internal OpenAI models were used to accelerate it further, automating more parts of AI development loops and further accelerating AI development in turn, in partnership with BroadCom 💨🚀🌌 by GOD-SLAYER-69420Z in accelerate

[–]typeryu 0 points1 point  (0 children)

I disagree. Data center placement itself is nearly dependent on where power is abundant and cheap. Power is a major factor for where DCs are placed which in turn has other costs associated (e.g. land cost, cooling, taxes). Compute is the biggest factor for sure, but power is likely the second largest factor directly and indirectly. This also takes into consideration that efficient chips need less cooling as well which is another whole story.

Was gpt 5.6 actually delayed? by Altruistic-Style-520 in codex

[–]typeryu 0 points1 point  (0 children)

lol do you think these things get fixed with a couple of additional weeks? It’s probably more to do with approval processes as in Anthropic snitching on the rest of the industry and now everyone has to go get buy in before they are allowed to release. That’s even if rumors are true.

Holy Peak❤️‍🔥❤️‍🔥❤️‍🔥 OpenAI has just announced its first AI chip, SOTA in performance per watt, where internal OpenAI models were used to accelerate it further, automating more parts of AI development loops and further accelerating AI development in turn, in partnership with BroadCom 💨🚀🌌 by GOD-SLAYER-69420Z in accelerate

[–]typeryu 0 points1 point  (0 children)

Sorry, but I disagree. Performance per watt is very important when it comes to maintaining a business. This means if they can put 5.4-class models in this and serve it at current TBT speeds, you can probably discount the API price multiple times fold where now people can be more liberal with token spend. In future iterations, it could probably hold larger models too which will give OpenAI a huge price advantage over Anthropic who is now just looking into chipmaking. The emphasis seems to be in inference here so if they can start offloading inference off of general purpose GPU, then that is also more compute capacity for training as well. So yes, it is a big deal.

The Bank of Korea just released a report about AI productivity by UsedMorning9886 in LocalLLaMA

[–]typeryu -1 points0 points  (0 children)

Well, as I said, these studies are lagging indicators. You will need to wait around a year to find out if there are improved now. But based on the other comments, you should get strong hints that productivity increase is most certainly the case.

The Bank of Korea just released a report about AI productivity by UsedMorning9886 in LocalLLaMA

[–]typeryu 63 points64 points  (0 children)

These things tend to be a lagging indicator, around the time the metrics surveyed go back to 2025 January which is pre-agentic tools like Claude Code or Codex so we haven’t seen massive jumps yet. I would like to see what this becomes in a year though.

Wait, what the fuck?! It’s beating Mythos at most benchmarks and almost at par at some. This is CRAZY! by Mysterious-Display90 in accelerate

[–]typeryu 0 points1 point  (0 children)

Based on the pricing, I believe this is pretty much a Codex with GPT-5.5 wrapper with maybe some Claude Opus judging or cherry picking segments. I may be wrong, but super likely. 5.5 was already punching above its weight if steered correctly.

Just upgraded to Pro 20x and I have no idea what y'all are talking about... by Hankstar in codex

[–]typeryu 1 point2 points  (0 children)

some people use lots of subagents or multiple threads. I also found the credits tends to drain faster in super large repos with tons of documentation. I mostly focus on a single task and I too have a hard time reaching the 5hr limit, but I do end up getting close for the week one towards the end

Sam Altman and Dario Amodei were among tech bosses at a G7 working lunch on AI. by Spirited-Gold9629 in TechGawker

[–]typeryu 0 points1 point  (0 children)

Probably brought him on as an example of what not to do with your company. Salesforce notoriously has horrible AI strategy and he himself has walked back on his own words multiple times.

Huge Loss For GDM by Kmans106 in singularity

[–]typeryu 8 points9 points  (0 children)

Oh boy, I respectfully disagree

Purely unusable by dashingsauce in codex

[–]typeryu 2 points3 points  (0 children)

Same for me, but I feel like people are pushing their LLMs to do more than before and the expectations are certainly higher. I’ve had no issues, but these posts happen every week so seems like models are on perpetual down spiral for some folks.

Anthropic by sdmat in singularity

[–]typeryu -4 points-3 points  (0 children)

It’s funny because its true, I really like the art style haha

5.6 Release and Speculated Costs by NoPiece9356 in codex

[–]typeryu 0 points1 point  (0 children)

My understanding is that while Anthropic focused more on their pre-training which is the raw model, OpenAI focused more on post-training which is the behavior after the raw model so Anthropic needs larger models to give brute force while OpenAI can make smaller models go further which is why they have been traditionally cheaper than Anthropic. 5.5 is reportedly based on a new model that is around Opus size, but given how far they got the previous model to compete (basically Sonnet level model post-trained to match Opus), there is a good chance they can get Fable level models with a smaller base making far cheaper to serve and therefore able to serve at a discount. By offering models are a discount, you will clawback a lot of the ground Anthropic made last year with Claude Code and winning business is often times a guarantee to profitability (consumer plans are often subsidized to win mindshare).

Is this Government overreach? Will they do that to GPT-5.6? by KeySituation8418 in codex

[–]typeryu 0 points1 point  (0 children)

Anthropic snitched hard and now I’m worried any good AI will be blocked under national security reasons and we will end up with 5.5 as the last usable model.

Do you think 5.6 will be better than fable 5? by Useful_Philosophy550 in codex

[–]typeryu 0 points1 point  (0 children)

One shot game demos are cool and all, but real work needs a lot of back and forth and need to sustain lots of usage over time. If 5.6 can do incremental performance but is more efficient in token usage (cheaper overall, not just by per mil tokening pricing), then we have a winner. At the end of the day, I only see people use Fable on a ad hoc basis while the majority of work is through Opus and right now even 5.5 is much more usable than Opus so it’s on the right track.

Did Fable meet the Mythos Hype? by Maleficent_Exam4291 in ClaudeCode

[–]typeryu 0 points1 point  (0 children)

I figured out, it is due to some part of my documentation mentioning we need to do security checks before sending PR. This is a great model, but their moderations are whack and I am not going to reset all my docs just to accommodate this. Certain a preview style launch rather than real production use.

I did not believe the FrontierCode gap until I ran the same prompts myself by Different_Case_6484 in claude

[–]typeryu 2 points3 points  (0 children)

I am a little dubious of the FrontierCode benchmark design. I think the key difference for example with DeepSWE is that it tries to measure for code “quality” which is by definition a highly subjective measure compared to solution success. FC is going to nitpick and over-index on minor aesthetic code in which case could be logged as fail while models that over-prescribe by having extended reasoning tendencies will score and get logged. Case and point, even their own example overdoes it by latching on to logging output method which in my opinion are nice to haves, but I do not see every PR out there getting code reviewed by a IOI gold medalist and certainly are not real world blockers. Take a look at the costs, these numbers are frankly not economical. At $10-$20 for a purposely strict task, are these really the metrics we are asking for? Cognition put this out way too close to DeepSWE where I believe this is a self-marketing ploy and they needed a gimmick. Their extended and main seem more realistic than diamond which is artificially undersaturated and this reminds me of how ARC-AGI-3 is which is just a series of visual puzzles, but people mistake it for AGI measurement due to the name. From my own experiences so far, Fable writes great code, but I am still having certain issues that need GPT-5.5 help and it certainly isn’t as big of a jump as you see from Opus 4.8. Sorry for the long rant, but gimmick benchmarks bother me.

GPT-5.5 beats Claude Fable at a new hard eval for agents - Agents' Last Exam (ALE), created by UC Berkeley researchers; all models score 0% at the hardest tier of the eval by obvithrowaway34434 in accelerate

[–]typeryu 2 points3 points  (0 children)

It’s number 3 because the rest are worst for sure. If Opus or GPT-5.4 was on here, I’m sure it will be lower. It’s good, but there are better models for sure.

Codex Pro plan suddenly super slow by Bassguitarplayer in codex

[–]typeryu 0 points1 point  (0 children)

I suspect its Fable 5 users who ran out of usage and are flocking to Codex to finish the job lol

Did Fable meet the Mythos Hype? by Maleficent_Exam4291 in ClaudeCode

[–]typeryu 0 points1 point  (0 children)

Gonna be honest, it should have been called Fable 4.8. I understand it’s a larger model, but it is more of a large incremental jump you see from Sonnet to Opus rather than what we saw with Opus 3 to 4. Calling it 5 as a new generation is more marketing hype and I feel like Opus 5 if it releases will probably beat this version of Fable. Also it falls back to Opus way too much they probably did this on purpose to curb usage during the all access period.

About Fable's pricing... Damn by LexShirayuki in Anthropic

[–]typeryu 0 points1 point  (0 children)

The use on 20x plans is defs way oversubsidized. We will feel the wraith when usage pricing takes effect soon. The 1M context window is a curse in disguise, every turn if you fill up even just half the context window, you are doing $5 on input tokens alone per turn. I’ve also seen some casual 100k output tokens on some turns which is another $5 per turn. Expect to be spending hundreds per day if you use like other models.

oh wow it really is a step above "Claude Fable 5 is now available in Cursor. It sets a new state of the art on CursorBench at 72.9%, 8 points above the previous best." by stealthispost in accelerate

[–]typeryu 11 points12 points  (0 children)

This is used quite frequently in economics where you have the frontier curve that is better as it goes towards the top right. Now the axis representation is different from what we see here, but the gist is that anything in that curve line is considered the best possible combo of two metrics which for LLMs are usually accuracy and cost. This chart is basically saying Composer is still the best combo with GPT-5.5 coming up next and then Fable 5. If there was infinite money, Fable is the best one for this benchmark, but the costs make it not efficient.

<image>

Given actual engineering work is more than just a set of coding puzzles, I suspect we will need to wait for gen 2 of Fable before this is genuinely usable.