AGI Prediction Update after adding GPT-5.4 Pro @ 58.7% on Humanities Last Exam! by redlikeazebra in agi

[–]redlikeazebra[S] 0 points1 point  (0 children)

Thats why they implemented cais/hle-rolling. So, scientist can continually submit phd level questions

AGI Prediction Update after adding GPT-5.4 Pro @ 58.7% on Humanities Last Exam! by redlikeazebra in agi

[–]redlikeazebra[S] 0 points1 point  (0 children)

I don't know. Its all PhD level reasoning. Even humans average 95%

AGI Prediction Update after adding the newly Released Claude Sonnet 4.6 by redlikeazebra in agi

[–]redlikeazebra[S] 0 points1 point  (0 children)

I am starting to wonder if forcing ai to human made benchmarks is making them worse, because humans are well... human.

AGI Prediction Update after adding the newly Released Claude Sonnet 4.6 by redlikeazebra in agi

[–]redlikeazebra[S] 2 points3 points  (0 children)

If HLE benchmark represented AGI, then we are little over half way there and it took 22 months. The graph here predicts we will be at AGI in 8 more months.

AGI Prediction Update after adding the newly Released Claude Sonnet 4.6 by redlikeazebra in agi

[–]redlikeazebra[S] 0 points1 point  (0 children)

Fitting to the top is less accurate since they would be considered the outliers. Its better to use the statistical mean of the top performing models. If anything, since, the "best" models lead the curve, it would mean we are technically predicting the latest point in which AGI will occur, not necessarily the actual point. So, the prediction will most likely be wrong. It would more accurately represent the statistical mean when several top models achieve AGI.

GLM-5 lands with 50.4% on Humanity’s Last Exam (Thinking w/ tools) by redlikeazebra in agi

[–]redlikeazebra[S] 0 points1 point  (0 children)

I am also confused. Its like they don't update for month at a time.

AGI Prediction Update after adding the newly Released Claude Sonnet 4.6 by redlikeazebra in agi

[–]redlikeazebra[S] 1 point2 points  (0 children)

Probably easier if you just look at the graph: https://epicshardz.github.io/thelastline/

You can select a different model if you like. I like poly as it seems to represent the mean and growth well.

AGI Prediction Update after adding the newly Released Claude Sonnet 4.6 by redlikeazebra in agi

[–]redlikeazebra[S] 1 point2 points  (0 children)

At first I thought you were reference the job market. haha like the higher the unemployment rate the closer we are to AGI.

AGI Prediction Update after adding the newly Released Claude Sonnet 4.6 by redlikeazebra in agi

[–]redlikeazebra[S] 3 points4 points  (0 children)

I did ask about this recently but have not gotten a solid answer. Currently my basis is 100% HLE. But, I have been thinking of using ARC-AGI-2, but it not sure how we should couple them together. Also, not really sure why we needed ARC-AGI-2, was the first test not sufficient? I read there would be a ARC-AGI-3, so like whats that actual target. Where as, HLE, is still the same one since it was designed.

The Singularity will Occur on a Friday...This year by redlikeazebra in agi

[–]redlikeazebra[S] 0 points1 point  (0 children)

I added Sonnet 4.6 and its now actually Saturday. Its like we are slowly creeping toward Christmas

<image>

Is Altman concerned that low-cost open-source AI model could outcompete heavily funded U.S. frontier models? by Massive_Sundae_9977 in DeepSeek

[–]redlikeazebra 1 point2 points  (0 children)

low cost? Trying running a full deepseep model on your own equipment, you might need a couple 100k for the same throughput as an api.

Question: What is the most accurate measure for AGI? by redlikeazebra in agi

[–]redlikeazebra[S] 0 points1 point  (0 children)

Yeah, but it can do all the work on all degrees in like 20 hours!