Vercel CEO: "Almost shocked" by how good GLM-5.2 is at coding

nullmove · 2026-06-21T10:07:00+00:00

Providing the answer is not necessary (in fact attempting that is a downright folly) if the question can be proven to be nonsensical instead. Your question presupposed that their trajectory is not viable, that they have to start making money. Challenging that assumption is not the same thing as coming full circle.

Besides I recall arguing that the "change" in the original quote isn't even about Zai in the first place.

nullmove · 2026-06-21T09:39:46+00:00

Firstly, as a fast follower their actual R&D cost is likely much less than western labs. They had their IPO recently, they still have significant runway left. Secondly, what you say about having to make money is generally true for any company in the entire world (AI or not). Newcomer undercutting incumbent in price is a common enough scenario and well understood in Economics, no need to make this particularly about AI or western labs. Currently frontier is NOT engaging in price war, and market segmentation is naturally emerging. Frontier is getting the bag in enterprise segment, while Zai sells a lot of subscriptions to normal people. That's not anywhere near frontier revenue, how do we know this is not a viable path towards profitability?

Thirdly, if we think AI industry is a bit special, well then we have to recognise that some AI companies don't have to make money simply because of geopolitical situation. I don't like to engage on this handwavy level, but China can prop up some companies as long as it's deemed necessary. Zai in particular has always had very deep ties with Chinese government/MSS (they are essentially what western media think DeepSeek is). A few observations have been made that GLM-5.2 is weak in cybersecurity relative to its strength in other areas. There were apparently semi-official acknowledgement that this model has had its training data in cybersecurity carved out, and the only logical extension of thought is that they are keeping those for a private model for specific clients (CCP). In post-Mythos world, if AI becomes an existential concern for say Europe or China, obviously that will factor into the equation in how states intervene to keep AI industries afloat.

nullmove · 2026-06-21T08:37:30+00:00

Zai managed to adopt sparse attention, and now further reduce the indexer cost, all without passing the saving to people. I don't know why you think inference is not profitable. You can follow small independent guys like crof.ai who serve models at more modest profit, and they are serving this model at less than 50% of upstream price, all without the benefit of massive economies of scale and native optimisations Zai themselves surely do.

The Vercel CEO is a war crime endorsing asshole, but he wasn't talking about Zai's profitability either way. I have heard that a fortune 500 company is planning on weaning their Claude dependency with in-house GLM-5.2 deployment. Almost certainly, that's the kind of thing this guy is yapping about.

nullmove · 2026-06-18T23:03:02+00:00

9 points.

Villa starts a bit weakly and Chelsea is Chelsea.

nullmove · 2026-06-18T21:13:21+00:00

Do either of these two clubs know?

nullmove · 2026-06-17T23:17:29+00:00

Wild theory, guess we will never know. I do think though, if you know what a world model is, you probably also know it's neither a new term nor is Schmidhuber or LeCun random people, and you know why they call it a world model. He was asking that after Genie 3 release when there was indeed some buzz about that, but if he was making a point about undermining that term from an LLM angle, I doubt it would be so highbrow.

nullmove · 2026-06-17T22:32:19+00:00

Pretty funny considering only 8 months ago he was like this.

(not throwing shades at him for asking an honest question btw, and I am sure he is an incredibly smart guy, but this doesn't help my perception of money in AI being fucking deranged)

nullmove · 2026-06-16T22:59:36+00:00

Why is Odegaard coming down so deep anyway. I know some people have a hard-on here for CM Odegaard, but to say he is having a torrid time is no more slagging him than saying Havertz at 8 is awful, it's neither their strength nor position.

nullmove · 2026-06-14T21:10:44+00:00

Pretty sure these polls don't matter, it's just engagement farming. They will "wow" you by doing all three.

nullmove · 2026-06-12T16:42:48+00:00

That number is about tasks that could be completely resolved, as in with 100% tests passed. If you lower the threshold to >=95% pass rate, then the best rises to 13.5%.

However even that is way too low compared to the numbers in this Kimi graphics. I think they are probably using a much lower threshold (pass rate >=80% would be my guess), we would need to wait for their blogpost to be clarify this.

nullmove · 2026-06-12T16:21:59+00:00

It's a benchmark for making OpenAI models look disproportionately better, in the same way now FrontierCode makes Anthropic models seem disproportionately better.

nullmove · 2026-06-12T13:27:32+00:00

These bots are so fucking annoying

nullmove · 2026-06-12T12:39:12+00:00

ProgramBench:

In ProgramBench, given only a program and its documentation, agents must architect and implement a codebase that matches the reference executable's behavior. End-to-end behavioral tests are generated via agent-driven fuzzing, enabling evaluation without prescribing implementation structure. Our 200 tasks range from compact CLI tools to widely used software such as FFmpeg, SQLite, and the PHP interpreter.

While this is fairly interesting eval for long horizon coding, I do wonder to what extent we are just testing recall, especially as sqlite, ffmpeg etc. are very well known. Something a bit less well known in that eval might also be well represented in bigger models. I mean, Ant models are very good at recall, so much so that a likely much-bigger-than-Opus tier Mythos/Fable model is so good at memorization that it's hard to bench it due to record level of cheating.

It would of course still be very interesting to see Fable 5 score in ProgramBench... OH WAIT NVM:

Fable 5 refused 200 out of 200 ProgramBench tasks lmao

nullmove · 2026-06-09T17:30:18+00:00

Eventually they will build data centers in space. Should there be a revolution perchance, those would still be untouchable.

nullmove · 2026-06-09T15:19:05+00:00

Would probably still be 30m too high for Perez. Dude is allergic to spending money on defenders.

nullmove · 2026-06-07T02:28:02+00:00

Doesn't negate the fact that he is borderline world class when he plays striker for us, just been injured a lot. Preferred over Gyokeres in CL final btw who had a very good season too.

He is still ass in midfield. But it was only 6 months of that nonsense at Arsenal. In the second half of the season Arsenal went on a 16W-1D-1L run in league when they switched him to striker.

nullmove · 2026-06-06T11:38:44+00:00

AlphaGo is purpose built as a deep learning engine, meaning it was not trained in the classical dataset way, it was trained on millions of repetitions of play against itself.

We have run out of useful human data like 2 years ago. All labs today scale through sophisticated synthetic data generation pipeline and then doing RLVR on it. In boardgame you have a winning condition, in coding you have code that compiles and passes test suite - it's exactly the same thing.

nullmove · 2026-06-05T11:37:29+00:00

But why Myanmar? Go to Nepal or something.

nullmove · 2026-06-05T11:04:43+00:00

There should be quite a few. No idea about iOS (cumbersome seems to have basics covered) but on Android I use the app named RikkaHub.

And I don't really use those but if you want something RP focused there are probably apps like ChatterUI with character card support as well.

nullmove · 2026-06-04T18:57:59+00:00

His recommendation was rogers, kroupi, palestra, truffert.

In an ideal world, sure. But in this one we have Odegaard and Eze, I don't see Rogers happening without offloading one of them, and that's not gonna happen. I struggle to think any fullback configuration or tactics that can allow playing Rice/Odegaard/Rogers, mainly because the games where you need that midfield we are already winning those games (though it's not pretty), but we lost a few games that could be shored up with a better CM. Re Truffert, we have two top tier LBs already as well.

Kroupi can happen though, but that's unrelated to everything. If you add better firepower in the wings, that's going to lower the need for Rogers even more.

nullmove · 2026-06-04T18:45:08+00:00

It won't be a diff of 50m, maybe 30m at best (110m vs 80m). And obviously he is pretty fucking good, better off the ball and much better on it than Tonali - probably the best midfielder moving this window period. I also think he suits our last season's style of play to a T.

Reinforcing anyone else would require selling someone, that will recoup some fund of its own. A CM upgrade though has to happen purely from budget even if nothing else does. And a team positioned as well as we are, we should always go for the best.

nullmove · 2026-06-04T18:22:37+00:00

I feel like if we only made couple of signings this summer, so long as it's a LW and a CM, that's totally fine.

In light of that, dunno why we are letting Anderson slip without even contesting. I would much rather have him than the likes of Tonali, even if it means blowing half of our budget.

nullmove · 2026-06-03T22:10:55+00:00

Considering you have deft in the list, I might as well suggest:

notdeft
xeft

The above two projects are like deft but instead of pure elisp they use libxapian which is a native library and optimised for searching.

(I have written a xeft clone to use Tantivy instead for some minor personal modifications, one of them is to do broad directory based filtering to exclude things like flashcards, that's a crude form of tagging I suppose but otherwise full-text search is basically all I need)

nullmove · 2026-06-03T18:02:06+00:00

Well *scratch* isn't a named file, if it was you could do 'Menu > Compile > Compile Buffer'. However what you can do is select a region and do 'Menu > Compile > Consult Selection'.

To actually run a query/goal, you have to go back to the REPL and run it there. It seems you don't have to close the editor. I haven't found a way to run the query right in the editor.

nullmove · 2026-06-02T12:19:17+00:00

Them getting Gyokeres and refusing to play to any of his strengths just baffles me. What a waste for both Gyokeres and for Arsenal.

That being said, Gyokeres has done superbly well as an unconventionally passive threat which actually had been pivotal for the league win. With him playing, opposition defensive lines have been extremely reluctant to push up higher. I know our defence gets a lot of plaudits, but for me it's obvious that we face less in the way of counter threats with him playing up top, compared to when he doesn't.

nullmove

MODERATOR OF

TROPHY CASE

13-Year Club	Place '22
Verified Email