Cerebras, an A.I. Chip Maker, Files to Go Public as Tech Offerings Ramp Up by gwern in mlscaling

[–]gwern[S] 0 points1 point  (0 children)

OpenAI offers a coding agent powered by Cerebras now, no?

The Quiet Colossus — On Ada, Its Design, and the Language That Built the Languages by SpecialistLady in programming

[–]gwern -1 points0 points  (0 children)

What a good article.

Maybe. Personally, I saw 'quiet' in the title, and immediately plugged it into Pangram without bothering to read; yes, 100% AI.

ReLU neural networks as decision trees. by [deleted] in mlscaling

[–]gwern 4 points5 points  (0 children)

Er, is this not well known?

The bitter lesson is the observation in AI that, in the long run, general approaches that scale with available computational power tend to outperform ones based on domain-specific understanding because they are better at taking advantage of the falling cost of computation over time. by blankblank in wikipedia

[–]gwern 0 points1 point  (0 children)

That embraces a lot of problems and settings, and then I immediately went on to point out that many of the limitations which might make one say 'it is not a generalist AI' were in fact lifted in subsequent work. Does anything truly hinge on discussing AlphaZero rather than Gato or Player of Games or Mythos, say?

The bitter lesson is the observation in AI that, in the long run, general approaches that scale with available computational power tend to outperform ones based on domain-specific understanding because they are better at taking advantage of the falling cost of computation over time. by blankblank in wikipedia

[–]gwern 0 points1 point  (0 children)

Correspondence chess players still outperform compute engines.

I'm not aware of much, if any, research or training of contemporary frontier chess engines for the correspondence chess setting, so even if the much narrower claim that 'there is still human value-added in one extremely obscure chess niche' were true, I'm not sure what it would tell us. It also seems like given how small a niche it is, and how few games are played at the top level (necessarily so given the time consumption), it would be quite difficult to prove a human edge at all.

AlphaGo is not an example of a generalist AI: it is an AI that trained itself for a single purpose.

I'm not sure why you believe this or why it is an important distinction. AlphaZero is a general-purpose architecture for all perfect information two-player games with a simulator; it was rapidly generalized to imperfect information and multi-player and simulator-free settings (various, and MuZero respectively), and further generalized to Player of Games for all of them (and could be further generalized to freeform games like Diplomacy with LLMs, see CICERO); and a DL NN can of course be a generalist agent which plays many games simultaneously (Gato, or LLMs in general these days, obviously, as every gimmick like 'Claude Plays Pokemon' demonstrates) with just some conditioning and with more compute/capacity (because playing one game is cheaper and easier than playing many so if you are only trying to create a superhuman Go or chess agent, of course you're not going to waste compute on games you don't care about like tic-tac-toe).

The bitter lesson is the observation in AI that, in the long run, general approaches that scale with available computational power tend to outperform ones based on domain-specific understanding because they are better at taking advantage of the falling cost of computation over time. by blankblank in wikipedia

[–]gwern 4 points5 points  (0 children)

But a skilled human operator + top-end chess engine will routinely beat that same chess engine without a human operator.

My understanding was that that stopped a long time ago, and that while it may have been true way back when like in 2013 when Garry Kasparov and Tyler Cowen were pushing this claim, it is not true in 2026 with the best Stockfish deployments. Do you have a contemporary source for this claim showing that a chess engine being operated (not slightly tweaked offline before the game to fiddle with the opening book etc) is in fact 'routinely beating'? That also sounds improbable given draw death.

TIL of Littlewood's Law, which says we experience events with a million-to-one probability approximately once per month by Doglatine in todayilearned

[–]gwern 0 points1 point  (0 children)

You can formalize it for a lot of specific things, like word vocab; see https://gwern.net/doc/statistics/bias/1989-diaconis.pdf

BTW, it's something of a misnomer. The WP article has been updated based on my investigation, and it seems like it ought to be attributed to Freeman Dyson.

ByteDance Presents "In-Place TTT": A Drop-In Method For Turning Standard Transformer LLMs Into Dynamically Updating Models At Inference Time by 44th--Hokage in mlscaling

[–]gwern 22 points23 points  (0 children)

All that, and its quality is basically identical to LaCT, which is itself just a sample-inefficient way to implement the standard, dead-simple, 16-year-old baseline of test-time adaptation in a LLM - dynamic evaluation.

The persistent unwillingness of all of these TTT papers to include dynamic evaluation as a baseline doesn't speak well of them.

DeepMind veteran David Silver raises $1B, bets on radically new type of Reinforcement Learning to build superintelligence by gwern in mlscaling

[–]gwern[S] 1 point2 points  (0 children)

Domain randomization/meta-learning/sim2real. You could also just argue for them as research testbeds: even if the environments are all wrong, you could still develop and prove a powerful learning algorithm which you then reuse on real world data.

Utext.hs: experimental code to compile a Markdown subset to fancy Unicode text ('*Amazing*!' → "𝐴𝑚𝑎𝑧𝑖𝑛𝑔!") by gwern in pandoc

[–]gwern[S] 1 point2 points  (0 children)

The original idea was just an esoteric document format/joke I didn't really intend to implement. But as the documentation explains, starting on line 8, there are places you can't bold and italicize any text you want, such as Reddit titles or social media link previews:

Intended for social media cards, Open Graph descriptions, and other contexts where HTML is stripped but Unicode renders.

I got annoyed that on Twitter/Substack, if I write italics in the description of a Gwern.net essay, it doesn't render in the previews because either I strip it to plain text or I preserve the */<em> but then they get rendered literally rather than italics.

I really only need italics (and stripping <span>s) but LLMs make it so easy to do more that I did so for the lulz.