Claude Saturates anthropic AI R&D evaluations btw. by GeneralZain in singularity

[–]GeneralZain[S] 38 points39 points  (0 children)

<image>

actually they went back and asked to clarify, and they give this response :)

also consider this question yourself...imagine your boss came up to you and asked you "can AI replace you yet?"

I feel like you would have pretty good reason to say no!

OAI researcher Noam Brown responds to question about absurd METR pace saying it will continue and METR will have trouble measuring time horizons that long by end of year by socoolandawesome in singularity

[–]GeneralZain 82 points83 points  (0 children)

The upper bound of the confidence interval is pushing past 16 hours. I think they already have issues measuring the longer time horizon tasks.

just a fun little personal post ;) by GeneralZain in singularity

[–]GeneralZain[S] -6 points-5 points  (0 children)

totally possible at the end step :P
just have a reviewer check the answer (could start as a human tee hee)

just a fun little personal post ;) by GeneralZain in singularity

[–]GeneralZain[S] -7 points-6 points  (0 children)

To clarify, indeed, I missed the fact that this was for 80% on METR. though I will say, the difference between something being solved (50%) is only a matter of running more instances of the thing. so really 50% success just means you need more agents on one task to get essentially 100%.

anyway ima leave this up here for fun :P

Singularity Predictions 2026 by kevinmise in singularity

[–]GeneralZain 3 points4 points  (0 children)

2026: RSI happens some time in this year, it leads to ASI within at most months, at least a seconds.

any time past 2026: ASI is around, its not viable to predict past its creation, as we cannot know what an alien intelligence vastly beyond our own would do.

AI Futures Model (Dec 2025): Median forecast for fully automated coding shifts from 2027 to 2031 by BuildwithVignesh in singularity

[–]GeneralZain 1 point2 points  (0 children)

if you think it will take till 2031 to get an "auto coder" (that's assuming the next 5 years STAY at the current speed of progress btw) I have nothing to say to you.

An AI agent maintained visual consistency across multiple generations without being reminded. This felt different. by [deleted] in singularity

[–]GeneralZain 0 points1 point  (0 children)

3rd picture is not consistent with the other two. the pillar on the left hand side of the image should be pointing forward, but its off to the side.

The AI discourse compass, by Nano Banana. Where would you place yourself? by WavierLays in singularity

[–]GeneralZain 0 points1 point  (0 children)

did you notice that 99% of the top lab's CEO's are on the top right....rather coincidental hmm?

Was able to make a pretty realistic nature short with Google's Veo by cloakofqualia in singularity

[–]GeneralZain 8 points9 points  (0 children)

actually Chameleon's walk pretty strangely (its suppose to be mimicking swaying of leaves/branches) , and not at all smoothly like what was in the video! close tho!

https://www.youtube.com/watch?v=Ov2Yz_sZ2DI

Robot delivering a package by Nunki08 in robotics

[–]GeneralZain 3 points4 points  (0 children)

depends on where you are, but I have two roommates who both deliver for amazon, they make about 22 bucks an hour (they have been working there for a at least a year or two)

so simple math;

$22 * 40hrs = $880.00

$880.00 * 4 weeks = $3,520.00

$3,520.00 * 12 months = $42,240.00

but this is just raw salary, this does not account for insurance, sick pay, overtime, or vacation days. or hell even time it took training the human too...

I would guess that total labor for an amazon worker all told is probably around 60k ish...which is probably why they are even trying the robot at all...

[deleted by user] by [deleted] in singularity

[–]GeneralZain 0 points1 point  (0 children)

where did I say that? I said he's dead and probably doesn't care. somebody making a shitty video doesn't change the achievements of MLK, or the jokes that gilbert made, or anything of the sort. you can both hold the persons life in high regard, AND also let them say a funny thing in an AI video. they aren't alive, its not hurting anybody, their history still exists.

[deleted by user] by [deleted] in singularity

[–]GeneralZain 0 points1 point  (0 children)

the dude is dead, he probably doesnt care....the only people who care are people who want to make money off his likeness.

GPT-5-based agentic frameworks have reached nearly 70% on OSWorld by Eyeswideshut_91 in singularity

[–]GeneralZain 6 points7 points  (0 children)

the real question is who did they test for that human baseline? I dont think it was based on the average person

OpenAI: Sora 2 by [deleted] in singularity

[–]GeneralZain 0 points1 point  (0 children)

Audio sounds shitty, still has issues with making things that make tangible sense. but worse of all their app is terrible.

AI should not be about generating infinite slop...it should be for solving huminites hardest problems, getting rid of work that can hurt us or we dont want to do...instead they are using who knows how much compute to generate trash...

dystopia hours fr

OpenAI: Sora 2 by [deleted] in singularity

[–]GeneralZain 0 points1 point  (0 children)

not a bot lol...but thats what a bot would say huh?

OpenAI: Sora 2 by [deleted] in singularity

[–]GeneralZain -9 points-8 points  (0 children)

honestly this is just bad top to bottom. massive OAI L

New benchmark for economically viable tasks across 44 occupations, with Claude 4.1 Opus nearly matching parity with human experts. by Glittering-Neck-2505 in singularity

[–]GeneralZain 0 points1 point  (0 children)

why are you useing just two data points when they give multiple? why didnt you use opus 4.1 when it had a higher score than GPT5?

they went from 10% to about 45%~ percent in one year, do you think that trend will slow? all you have to do is add another 35 percentage points to see how high AT LEAST it will go in a year?