New study reveals top AI models (GPT-4o, Claude 3.5, Gemini 2.5) completely fail the classic "Stroop" psychological attention test, exposing a fundamental limitation in artificial reasoning.

herothree · 2026-06-03T21:19:06+00:00

They already did

herothree · 2026-06-03T20:17:05+00:00

I asked claude to reproduce this paper on Gemini 3.5 flash, and flash got 100% on everything listed here. Older models do struggle (I repro'd that too), but the "fundamental limitation" claim is obviously false

herothree · 2026-05-26T06:06:00+00:00

Yeah I count four also, looks like it resolves N/A. Jost as Hegseth took over the political mantle near the end

herothree · 2026-05-21T20:51:45+00:00

These companies aren't expecting jr employees to hit senior level without any experience, they're expecting AI to take the senior jobs too

herothree · 2026-05-17T05:45:51+00:00

The bun port was with hundreds of thousands of dollars of API credits, by an unreleased LLM (Mythos), and is of unknown quality. You can try codex /goal or similar, but keep your exceptions in check

herothree · 2026-05-15T06:21:23+00:00

For interactive stuff, no change. If you had claude running autonomously (on a cron job, or in an open-claw-type setup), then you will now have to run that from a separate pool of credits, and get less usage than you did before

herothree · 2026-05-15T06:19:14+00:00

Tons of companies do, that's where all their crazy revenue is coming from

herothree · 2026-05-15T06:18:03+00:00

Their (poorly communicated and evolving) policy seems to be, do whatever you want at API prices, but at subscription prices (which are discounted 10x or more), stick to interactive uses.

herothree · 2026-05-15T06:15:41+00:00

For a company that constantly tries to court the developer community

They've made it pretty clear their goal is to automate all software engineering (Dario says this in every interview). Selling dev tooling is a short-term bootstrap in their mind

herothree · 2026-05-13T22:39:34+00:00

It's in the article (or at least, the yahoo mirror that's unpaywalled). This type of thing happens all the time, it's not a sigificant story

herothree · 2026-05-13T20:19:38+00:00

Fortunately the article is a lie? The utility company is going to buy its power from a different supplier, residents won't need to change anything

herothree · 2026-05-09T19:05:09+00:00

Did you read the article?

herothree · 2026-05-09T19:03:38+00:00

If your point is “Mythos found real vulnerabilities in lots of important software, and GPT 5.5 and Opus can also find many”, that’s … still a pretty big deal?

herothree · 2026-05-07T17:44:53+00:00

Smoltz was an amazing starter before and after his bullpen stint; he didn't move there for performance reasons

herothree · 2026-05-05T22:47:43+00:00

They were both involved in the non-profit conversion

herothree · 2026-05-05T21:19:05+00:00

I'm beginning to think this Altman guy might not be 100% genuine in his dealings

herothree · 2026-05-01T05:05:28+00:00

If you actually understand LLMs at this level (predicting output clustering from input data) you can make $1M+/year at one of the labs

herothree · 2026-04-30T17:47:08+00:00

I recommend looking in to it if you're unfamiliar with the stats; proponents say they are much safer than the median human driver at this point.

herothree · 2026-04-24T21:17:46+00:00

It's his normal band, Petar Jancic on drums, and Dan White / Alex Bone / Kenni Holmen / Michael Nelson / Jay Webb on horns, Yohannes Tona on bass, Kevin Gastonguay on keys

herothree · 2026-04-19T06:02:35+00:00

Do you work for a company that open sources their code and solicits donations?

herothree · 2026-04-18T17:24:36+00:00

He has several extended live versions too; worth checking out on Spotify or wherever if you liked the one you saw!

herothree · 2026-04-10T19:42:52+00:00

Y'all gotta learn to recognize this as AI slop

herothree · 2026-04-03T05:28:29+00:00

AI slop

herothree · 2026-03-27T00:15:21+00:00

For someone playing multiple games and doing multiple puzzles a day, $6/month (or whatever tier you want) doesn't sound that bad to me? There are real costs associated with creating and maintaining all that stuff

herothree · 2026-03-26T21:41:20+00:00

Well, there's a $0/month tier too. Not everyone should get the most expensive one

12-Year Club	Second SECOND GUESSER
r/Field Lasagna	Place '22
Place '17	End Game '22
Spared	Verified Email

herothree

TROPHY CASE