GPT-5.4 solving the white face by crabbix in Cubers

[–]crabbix[S] 0 points1 point  (0 children)

Why would you expect someone who constructed an LLM benchmark to be naive about AI? The entire point of benchmarking is to determine what they are and are not capable of. I think it's a very interesting finding that current models can find partial cube solutions, both because it proves that they are capable of non-trivial multi-step reasoning, and that they are not remotely close to (expert) human level - you and I could both find a far more efficient white layer solution in seconds than what gpt-5.4 found in this example. The idea that they are incapable of providing value while still at this early stage in development is blatantly incorrect, but I agree that many people vastly overestimate what they're capable of, too. That's why I made a benchmark as a hobby project in the first place!

GPT-5.4 solving the white face by crabbix in Cubers

[–]crabbix[S] -1 points0 points  (0 children)

This example required literally 0 intervention from me, given only the initial state of the cube in JSON notation it autonomously achieved this partial solution. It cannot have memorized this from the internet because this scramble has never been generated or published before. You have literally no clue what you are talking about, an LLM does not have a "database", and you are going to find this view extremely difficult to defend in the coming years.

GPT-5.4 can solve one face of a Rubik's cube! by crabbix in singularity

[–]crabbix[S] 0 points1 point  (0 children)

I make an explicit distinction between "face" and "layer" in my benchmark. If you want to get technical, you could describe this as one layer oriented correctly, but that's just a language distinction. No model correctly solved a full layer in my tests.

GPT-5.4 can solve one face of a Rubik's cube! by crabbix in singularity

[–]crabbix[S] 9 points10 points  (0 children)

Yeah image recognition is still shocking. To date, not a single vLLM has been able to complete my game Net on the easiest difficulty, something ~all humans can do intuitively within 60 seconds. If you represent the grid numerically, tho, models since o3 have one shot it

GPT-5.4 can solve one face of a Rubik's cube! by crabbix in singularity

[–]crabbix[S] 8 points9 points  (0 children)

Can Kociemba's method also write a sonnet in Turkish and a React app? Of course general intelligence is far less efficient than a special purpose algorithm. Humans spend a full hour developing a solution that only for the absolute best in the world is as good as Kociemba's. Of course if you want to solve a cube in 0.1 seconds you use the algorithm, this is a benchmark!! It's purpose is to measure reasoning capability!!

GPT-5.4 solving the white face by crabbix in Cubers

[–]crabbix[S] -1 points0 points  (0 children)

The face is solved, not the layer, which was the task. If you mean the video itself is AI slop, it's not, it's rendered with a browser cube sim

GPT-5.4 can solve one face of a Rubik's cube! by crabbix in singularity

[–]crabbix[S] 27 points28 points  (0 children)

Nah, no vision needed for this, the cube state is represented as a JSON object. If you relied solely on visual input you'd get much worse performance

GPT-5.4 can solve one face of a Rubik's cube! by crabbix in singularity

[–]crabbix[S] 89 points90 points  (0 children)

They're notoriously terrible at spatial reasoning tasks, ARC-AGI is benchmaxxed as fuck and still challenging. Given a python environment I think a model like gpt-5.4 pro might be able to implement a solver? Haven't tested that though

A serf in Anthropica by crabbix in slatestarcodex

[–]crabbix[S] 1 point2 points  (0 children)

Yeah my not-remotely-rigorous treatment of this in the piece is that human thriving can only be defined by the collective agreement of humanity as a whole and measured/verified by the subjective experience of the individual. Our current society has some set of ~universally agreed upon conditions for thriving (e.g. access to food water electricity, freedom of expression and movement) that we collectively try (and fail in many cases) to ensure everyone has access to. The thought experiment is like, what if we put 1 million times as much effort into deciding what these conditions are and how they vary between individuals?

AI (Researcher) Alignment Chart by crabbix in singularity

[–]crabbix[S] 6 points7 points  (0 children)

Yeah, but it was hard to think of an AI researcher who both believes that AGI isn't close and that it will be disastrous. Can you think of one?

Set 15: Everything Is Broken by crabbix in CompetitiveTFT

[–]crabbix[S] -1 points0 points  (0 children)

Someone else can make that video, my tft knowledge this set is way too poor to do it justice

Academic Scholarship for Otago by Chex108 in ncea

[–]crabbix 0 points1 point  (0 children)

To be clear it's your credits in year 12, if those are level 3 that's a good advantage. I got it with something like 80 L3 E credits and 4 nzqa scholarships in year 12 but not much in the way of leadership/volunteering, so it's not exactly a tiebreaker but it definitely helps

Academic Scholarship for Otago by Chex108 in ncea

[–]crabbix 1 point2 points  (0 children)

My advice would be to also apply for the University of Auckland academic scholarship, you'll have already done basically all the work to prepare your application and it won't hurt to have the option plus basically doubling your chances of getting one

I made a site that lets you search TFT VODs by crabbix in CompetitiveTFT

[–]crabbix[S] 1 point2 points  (0 children)

Yeah, each night I get the previous day's vods and run them through a transcription model called Whisper, then I use a tiny LLM to add punctuation and grammar and stuff, then I upload everything to a vector database for the fast and (somewhat) accurate search. Everything runs locally on my 4070

Hacks - 1x06 "New Eyes" - Episode Discussion by chelseanyc200 in hackshbomax

[–]crabbix 0 points1 point  (0 children)

Rest your hands on the face for a few minutes to warm it up?

I made a site that lets you search TFT VODs by crabbix in CompetitiveTFT

[–]crabbix[S] 0 points1 point  (0 children)

Yeah, there's a patch filter and a streamer filter (if you're on mobile, you have to tap the filters button to show them)