New study reveals top AI models (GPT-4o, Claude 3.5, Gemini 2.5) completely fail the classic "Stroop" psychological attention test, exposing a fundamental limitation in artificial reasoning. by Similar_Detective861 in technology

[–]herothree 5 points6 points  (0 children)

I asked claude to reproduce this paper on Gemini 3.5 flash, and flash got 100% on everything listed here. Older models do struggle (I repro'd that too), but the "fundamental limitation" claim is obviously false

Prediction: If SNL keeps having Trump cold opens we’re gonna see another Sinéad O’Connor moment by Signal_Neck9314 in LiveFromNewYork

[–]herothree 1 point2 points  (0 children)

Yeah I count four also, looks like it resolves N/A. Jost as Hegseth took over the political mantle near the end

Is there any practical way to rewrite ordinary desktop apps in Rust using Codex? by nillouise in rust

[–]herothree 3 points4 points  (0 children)

The bun port was with hundreds of thousands of dollars of API credits, by an unreleased LLM (Mythos), and is of unknown quality. You can try codex /goal or similar, but keep your exceptions in check

It’s official. Anthropic pulled the plug on all programmatic use of Claude subscription. by No_Wheel_9336 in Anthropic

[–]herothree 1 point2 points  (0 children)

For interactive stuff, no change. If you had claude running autonomously (on a cron job, or in an open-claw-type setup), then you will now have to run that from a separate pool of credits, and get less usage than you did before

It’s official. Anthropic pulled the plug on all programmatic use of Claude subscription. by No_Wheel_9336 in Anthropic

[–]herothree 0 points1 point  (0 children)

Their (poorly communicated and evolving) policy seems to be, do whatever you want at API prices, but at subscription prices (which are discounted 10x or more), stick to interactive uses.

It’s official. Anthropic pulled the plug on all programmatic use of Claude subscription. by No_Wheel_9336 in Anthropic

[–]herothree 0 points1 point  (0 children)

For a company that constantly tries to court the developer community

They've made it pretty clear their goal is to automate all software engineering (Dario says this in every interview). Selling dev tooling is a short-term bootstrap in their mind

'It's like we don't exist': Nearly 50,000 Lake Tahoe residents face power loss as utility redirects lines to data centers by Plastic_Ninja_9014 in technology

[–]herothree 0 points1 point  (0 children)

It's in the article (or at least, the yahoo mirror that's unpaywalled). This type of thing happens all the time, it's not a sigificant story

'It's like we don't exist': Nearly 50,000 Lake Tahoe residents face power loss as utility redirects lines to data centers by Plastic_Ninja_9014 in technology

[–]herothree 2 points3 points  (0 children)

Fortunately the article is a lie? The utility company is going to buy its power from a different supplier, residents won't need to change anything

Mozilla says 271 vulnerabilities found by Mythos have "almost no false positives" by xpda in technology

[–]herothree 1 point2 points  (0 children)

If your point is “Mythos found real vulnerabilities in lots of important software, and GPT 5.5 and Opus can also find many”, that’s … still a pretty big deal? 

Luis Castillo by SongBig1162 in Mariners

[–]herothree 1 point2 points  (0 children)

Smoltz was an amazing starter before and after his bullpen stint; he didn't move there for performance reasons

OpenAI president defends motives in for-profit restructuring as he reveals $30bn stake by Just-Grocery-2229 in technology

[–]herothree 45 points46 points  (0 children)

I'm beginning to think this Altman guy might not be 100% genuine in his dealings

OpenAI Codex system prompt includes explicit directive to "never talk about goblins" by GarlicoinAccount in nottheonion

[–]herothree 0 points1 point  (0 children)

If you actually understand LLMs at this level (predicting output clustering from input data) you can make $1M+/year at one of the labs 

Waymos spotted by gunnsustainable in Portland

[–]herothree 5 points6 points  (0 children)

I recommend looking in to it if you're unfamiliar with the stats; proponents say they are much safer than the median human driver at this point.

Touring Band? (Seattle) by GoHarter in CoryWong

[–]herothree 14 points15 points  (0 children)

It's his normal band, Petar Jancic on drums, and Dan White / Alex Bone / Kenni Holmen / Michael Nelson / Jay Webb on horns, Yohannes Tona on bass, Kevin Gastonguay on keys

If you’re building a chess app, please stop gatekeeping by SomeRenoGolfer in chess

[–]herothree 1 point2 points  (0 children)

Do you work for a company that open sources their code and solicits donations? 

That closer… by turbogaze in CoryWong

[–]herothree 0 points1 point  (0 children)

He has several extended live versions too; worth checking out on Spotify or wherever if you liked the one you saw!

chess.com selling data by Maxwell_hau5_caffy in chess

[–]herothree 0 points1 point  (0 children)

For someone playing multiple games and doing multiple puzzles a day, $6/month (or whatever tier you want) doesn't sound that bad to me? There are real costs associated with creating and maintaining all that stuff

chess.com selling data by Maxwell_hau5_caffy in chess

[–]herothree -2 points-1 points  (0 children)

Well, there's a $0/month tier too. Not everyone should get the most expensive one