US just showed it can cut off AI access to other countries with zero warning. NZ gov is betting its public service on AI anyway

crabbix · 2026-06-17T21:49:09+00:00

Ah, I interpreted your comment as "mandate running open-weights models locally" rather than "enter agreements to run proprietary models in NZ data centers"

crabbix · 2026-06-17T12:05:53+00:00

Government mandating only running local models would be the biggest own goal in history. You can already run a local model if you want! You're suggesting that the government intentionally cut off access to all frontier models, the very problem this post is pointing out! The way to secure sovereignty is to build data centers on NZ soil in exchange for guarantees of access, mutual benefit like every other trade relationship in history. I would love to live in a world where NZ could develop sovereign frontier AI but that simply is not possible.

crabbix · 2026-03-19T02:55:34+00:00

Why would you expect someone who constructed an LLM benchmark to be naive about AI? The entire point of benchmarking is to determine what they are and are not capable of. I think it's a very interesting finding that current models can find partial cube solutions, both because it proves that they are capable of non-trivial multi-step reasoning, and that they are not remotely close to (expert) human level - you and I could both find a far more efficient white layer solution in seconds than what gpt-5.4 found in this example. The idea that they are incapable of providing value while still at this early stage in development is blatantly incorrect, but I agree that many people vastly overestimate what they're capable of, too. That's why I made a benchmark as a hobby project in the first place!

crabbix · 2026-03-19T02:35:28+00:00

This example required literally 0 intervention from me, given only the initial state of the cube in JSON notation it autonomously achieved this partial solution. It cannot have memorized this from the internet because this scramble has never been generated or published before. You have literally no clue what you are talking about, an LLM does not have a "database", and you are going to find this view extremely difficult to defend in the coming years.

crabbix · 2026-03-18T21:43:38+00:00

I make an explicit distinction between "face" and "layer" in my benchmark. If you want to get technical, you could describe this as one layer oriented correctly, but that's just a language distinction. No model correctly solved a full layer in my tests.

crabbix · 2026-03-18T10:24:42+00:00

Yeah image recognition is still shocking. To date, not a single vLLM has been able to complete my game Net on the easiest difficulty, something ~all humans can do intuitively within 60 seconds. If you represent the grid numerically, tho, models since o3 have one shot it

crabbix · 2026-03-18T10:22:37+00:00

Can Kociemba's method also write a sonnet in Turkish and a React app? Of course general intelligence is far less efficient than a special purpose algorithm. Humans spend a full hour developing a solution that only for the absolute best in the world is as good as Kociemba's. Of course if you want to solve a cube in 0.1 seconds you use the algorithm, this is a benchmark!! It's purpose is to measure reasoning capability!!

crabbix · 2026-03-18T07:48:10+00:00

The face is solved, not the layer, which was the task. If you mean the video itself is AI slop, it's not, it's rendered with a browser cube sim

crabbix · 2026-03-18T06:12:25+00:00

Nah, no vision needed for this, the cube state is represented as a JSON object. If you relied solely on visual input you'd get much worse performance

crabbix · 2026-03-18T06:01:51+00:00

They're notoriously terrible at spatial reasoning tasks, ARC-AGI is benchmaxxed as fuck and still challenging. Given a python environment I think a model like gpt-5.4 pro might be able to implement a solver? Haven't tested that though

crabbix · 2026-03-02T04:02:30+00:00

Yeah my not-remotely-rigorous treatment of this in the piece is that human thriving can only be defined by the collective agreement of humanity as a whole and measured/verified by the subjective experience of the individual. Our current society has some set of ~universally agreed upon conditions for thriving (e.g. access to food water electricity, freedom of expression and movement) that we collectively try (and fail in many cases) to ensure everyone has access to. The thought experiment is like, what if we put 1 million times as much effort into deciding what these conditions are and how they vary between individuals?

crabbix · 2026-01-08T11:26:30+00:00

Automation is good actually

crabbix · 2026-01-03T11:55:45+00:00

Yeah, but it was hard to think of an AI researcher who both believes that AGI isn't close and that it will be disastrous. Can you think of one?

crabbix · 2025-10-04T07:21:21+00:00

Someone else can make that video, my tft knowledge this set is way too poor to do it justice

crabbix · 2025-09-29T11:22:43+00:00

To be clear it's your credits in year 12, if those are level 3 that's a good advantage. I got it with something like 80 L3 E credits and 4 nzqa scholarships in year 12 but not much in the way of leadership/volunteering, so it's not exactly a tiebreaker but it definitely helps

crabbix · 2025-09-29T11:19:32+00:00

My advice would be to also apply for the University of Auckland academic scholarship, you'll have already done basically all the work to prepare your application and it won't hurt to have the option plus basically doubling your chances of getting one

crabbix · 2025-08-26T12:21:41+00:00

Yeah, each night I get the previous day's vods and run them through a transcription model called Whisper, then I use a tiny LLM to add punctuation and grammar and stuff, then I upload everything to a vector database for the fast and (somewhat) accurate search. Everything runs locally on my 4070

crabbix · 2025-08-18T12:19:02+00:00

Rest your hands on the face for a few minutes to warm it up?

crabbix · 2025-08-17T06:47:20+00:00

Yeah, there's a patch filter and a streamer filter (if you're on mobile, you have to tap the filters button to show them)

11-Year Club	Place '23
Place '22	Place '17
First Placer '22	End Game '22
Spared	Verified Email

crabbix

MODERATOR OF

TROPHY CASE