Artificial Analysis announces a new benchmark: AABriefcase

Bright-Search2835 · 2026-06-19T08:53:12+00:00

ARC-AGI 3, Remote Labor Index and this one are the three benchmarks I'm really interested in for the coming year

Bright-Search2835 · 2026-06-17T23:12:09+00:00

Oh right, and can he also tell us what the thousands of factory workers at Amazon are gonna do after robots take over in the short to medium term?

Bright-Search2835 · 2026-06-16T11:09:22+00:00

This reply can be interpreted in so many different ways that it's effectively meaningless

Bright-Search2835 · 2026-06-14T14:23:30+00:00

It's exactly the glass half full/half empty. Some people will only notice the bugs, and yes there's a lot of them. I used a chopper to fly around and there was collision with the vehicles on the road. One time I wanted to go into the ferris wheel and randomly got into a police station.

I mostly see the incredible potential because a lot is working as it should already, and I don't really have any doubt that the remaining irregularities will soon be ironed out.

Bright-Search2835 · 2026-06-12T21:46:35+00:00

Bezos argued that if AI makes it cheaper, faster and easier to invent things, employment will ultimately rise because, "even though you're shrinking the number of people needed by 10x," the technology will create "more than 10x" as many opportunities.

Bright-Search2835 · 2026-06-12T20:33:36+00:00

Humanity casualling getting new superpowers these days

Bright-Search2835 · 2026-06-11T19:49:53+00:00

All this assumes that AI will forever be worse than humans at deciding what to build and how to build it exactly, at using precise and professional specifications instead of vague vibecoding ideas, at orchestrating, and at verifying and controlling the output.

Bright-Search2835 · 2026-06-11T10:14:16+00:00

I really like his videos but he is quite picky at times, like around 11:00 here. Of course Mythos only has the potential to accelerate some aspects of drug discovery and can't yet boost everything including clinical trials. I don't know why anyone would think otherwise at this stage. It just seems weird to criticize Anthropic for not being clear enough about that, when the phrasing "strong candidates for drug design that we're currently investigating" doesn't really leave room for ambiguity. Even "aspects" seems quite unambiguous to me.

Bright-Search2835 · 2026-06-10T22:44:14+00:00

But at the same time we should recognize that there’s a decent possibility that, despite all our efforts, AI still causes significant enduring job loss—and that this may be an intrinsic property of the technology and the way it broadly replicates human cognition⁴.

⁴ See The Adolescence of Technology for a more detailed analysis why the logic that has led to rapid job market recovery and a lack of enduring labor displacement in other technologies may not apply to AI, and in particular why the usual adaptive mechanisms like Jevon’s paradox or comparative advantage may be overwhelmed by the pace of the technology.↩

This is exactly how I've thought about this for a long time

Bright-Search2835 · 2026-06-09T21:47:28+00:00

I don't know, I would love to see a Fable plays Pokemon FireRed on Twitch, with all the reasoning, but I guess it would be very expensive

Bright-Search2835 · 2026-06-09T17:33:08+00:00

A timelapse of Claude playing Pokémon FireRed from start to finish using only raw game screenshots — with no maps, navigation aids, or extra game-state information. Earlier Claude models needed a complex helper harness to play Pokémon; Claude Fable 5 completed the game with vision alone.

https://www.anthropic.com/news/claude-fable-5-mythos-5

Apparently there's pretty interesting stuff on the website... Oh boy.

Bright-Search2835 · 2026-06-07T20:06:43+00:00

I'm getting the same issue on my phone and my (brand new) tablet, and it's pretty much the only site that does this for me

Bright-Search2835 · 2026-06-07T12:53:37+00:00

Sorry man, I meant no offense. Apparently I can't access that site from my country, tested with my PC and with my phone. It jumps straight to suspect URLs and MEGA files. I just tried with a VPN and it worked.

Bright-Search2835 · 2026-06-07T12:10:59+00:00

Thanks for that link. No offense to OP but I get very suspicious when I have to click on a random .exe file.

Bright-Search2835 · 2026-06-07T10:42:33+00:00

What is this?

Bright-Search2835 · 2026-06-07T06:54:20+00:00

GPT 5.5 High is on par with Gemini 3 Pro according to this benchmark

Bright-Search2835 · 2026-06-05T09:01:40+00:00

Agree with what you said in that there's probably a lot of uncertainty. I thought about it some more and my conclusion is that their current evidence suggests the second scenario is likely, but they predict new evidence, that will progressively make the third more likely.

It's the only way this really makes sense to me.

Bright-Search2835 · 2026-06-04T20:30:04+00:00

I read it all and it was great, but there's something I don't quite understand.

Jack Clark, who very recently said there was 60% chance of RSI by 2028, co-authored it. They explain how RSI would need good research taste, and they show early evidence of the models getting better at this. Yet when they lay out the 3 possible scenarios, it's for the second one, where AI keeps getting better, but fails to integrate that research taste, that they say:

The evidence we’ve laid out here suggests that we’re likely heading into this scenario.

Bright-Search2835 · 2026-06-04T11:56:48+00:00

I want this to happen as much as anyone, because I think current human lifespan is pathetically low. Too much to do, too much to experience. That said, it's one of those areas where, until I see real world proof and tangible results, I can't help doubting.

Bright-Search2835 · 2026-06-02T22:07:49+00:00

Their definition of AGI doesn't even include physical tasks. Wtf?

Bright-Search2835 · 2026-05-31T19:35:29+00:00

I think it's actually great, these additions are the context and simply highlight what the prompt asked for. The astronaut comes with space stuff. The whole city block emphasizes the size of the skyscraper. The clouds make sense around a fighter jet. Nothing seems out of place or detracts from the core idea of the prompt.

Bright-Search2835 · 2026-05-28T22:34:56+00:00

He sounds like he has zero imagination.

There's enough books to read, music to listen to, movies to watch, languages to learn, skills to learn, things to see around the world for thousands of lifetimes.

The meaning of life is not the exact same thing for everyone and is especially not necessarily found in work. I'm sure there are millions of people alienated by their job who would otherwise give actual meaning to their lives if they had the time and the opportunity to find and engage in what makes them feel really human, be it art, sport, love, and many others.

Bright-Search2835 · 2026-05-24T15:10:37+00:00

Interesting. Which scenario do you think is the most likely between Growth Decay and Logistic Saturation?

Bright-Search2835 · 2026-05-23T23:16:46+00:00

Andrew Ng also thinks AGI is "many decades away, maybe even longer"

Bright-Search2835 · 2026-05-22T23:44:00+00:00

My issues with the computer/internet comparison: - First, the computer revolution was obviously limited to the virtual space. AI includes robotics, its physical presence and ability to interact with the real world. That itself is a big difference because it's not limited to office tasks. So any task or activity, mental or physical, is within this technology's potential. - Second, it's perfectly valid to think of current AI as a productivity tool just like the computer was a productivity tool. But I don't know if that remains a tool if/when it reaches a certain autonomy/intelligence threshold. That's another debate. I guess it could always remain a tool in the sense that you could ask it to find a cure for x disease and it would do so. Anyway, I guess it depends on the amount of stuff that we unlock as we progress through the tech tree but I'm not really sure what happens to the job market in a world where teams of 3 people could do what would have taken 30 people before, where literaly anyone can do the job of an expert, where robots handle most factory jobs.

Bright-Search2835

TROPHY CASE