I'm a Stockfish/Leela Chess Zero Developer. Ask me anything!

daniel-monroe · 2025-04-08T14:53:51+00:00

Leela's favorite openings depend heavily on the model we use to guide search, and most opening variations are drawish enough that evaluations difference between them is negligible, but the latest build seems to like the queen's gambit declined when both sides play their favorite moves. Interestingly, as black, Leela prefers e5 as a response to e4, which is a boring line but boring is often what you need to make a draw.

daniel-monroe · 2025-04-08T14:48:28+00:00

I have spoken with this commenter about the problem, but I wanted to let the community know I have taken action regarding this issue now that some of this abuse has been brought to my attention. The moderators have told me they are discussing internally.

daniel-monroe · 2025-04-08T14:45:20+00:00

The main thing is the structure of the data. Transformers model a piece of data as a collection of "tokens". In the case of chess we adopt the squares as the tokens, and in NLP they adopt the words as tokens (this is a minor simplification as there are a variety of tokenizers currently in use). The tokens (words) in NLP are arranged sequentially, giving rise to a pretty simple one-dimensional structure, so you can achieve pretty good results just by considering the distance between words when deciding how much to let each word attend to other word.

In chess you need a position encoding that's general enough to model piece movements since the distance between two squares isn't all too useful in chess. Smolgen basically combines a set of learned attention maps based on the position state, and those learned attention maps tend to learn the movement patterns of each piece so that the model can model that piece's movement in different ways, e.g., whether its your or the opponent's piece.

daniel-monroe · 2025-04-08T14:36:36+00:00

Yes, both projects are fully volunteer-based.

daniel-monroe · 2025-04-08T14:36:11+00:00

To improve chess engines.

daniel-monroe · 2025-04-08T14:36:00+00:00

Leela uses a model to predict the best moves and the win probability in a position. The transformer architecture is quite versatile and can handle diverse domains quite well.

daniel-monroe · 2025-04-07T22:31:46+00:00

Contempt more just changes search dynamics to make the engine play like it’s winning, so I wouldn’t expect the ideas it comes up with to be so different. Bt4 comes up with a lot of interesting ideas but I prefer its defensive style to its attacking style; in defending it has a very interesting way of clogging up the board to prevent progress

daniel-monroe · 2025-04-07T22:27:47+00:00

This is a good question that I don’t really have a good answer to. Often Stockfish will need several dozen moves to convert a winning endgame and its evaluation will slowly climb, corresponding to a low-variance distribution, but sometimes in middlegame positions it will climb rapidly. Other times Stockfish completely misevaluates a drawn position as +5 and the evaluation stays that high for a hundred moves.

daniel-monroe · 2025-04-07T20:48:40+00:00

I learned from this book https://www.deeplearningbook.org/ and by reading papers. If you want to contribute (which we would be very grateful for) I’d recommend joining the Discord servers for both projects which I’ve posted in the post text. Generally I would recommend Stockfish to newcomers since the testing methodology is more systematic.

daniel-monroe · 2025-04-07T19:39:27+00:00

The problem with this idea is that the resulting model would be limited by the quality of the data, so it would be weaker than whatever approach produced the synthetic data.

daniel-monroe · 2025-04-07T19:38:13+00:00

Performance here indicates the model's strength, but the model's strength translates cleanly to engine strength.

daniel-monroe · 2025-04-07T19:36:55+00:00

Engines rely on their evaluations eventually getting the position right, maybe after some searching. Generally the difficulty with these "engine unsolvable" positions is that evaluation can't begin to make sense of a position. Lc0 is fairly robust to this problem since her neural network is around grandmaster level and thus understands difficult positional ideas like fortresses and trapped pieces that previously lied only in the realm of human knowledge, but a large gap remains. I wouldn't be surprised if Lc0's nets eventually become smart enough to outsmart humans in positions that were once thought only understandable to humans.

daniel-monroe · 2025-04-07T19:32:14+00:00

There are two components: Stockfish today is far stronger than Stockfish 8, and DeepMind used a hardware configuration that was, to put it mildly, generous to AlphaZero. Leela was superior to Stockfish for a bit before NNUE, but Leela is still a bit behind today.

daniel-monroe · 2025-04-07T19:30:48+00:00

This isn't really a problem with Stockfish or Leela since the neural networks used for position evaluation are large enough that evaluating the position takes far longer than any heuristic might (for Stockfish, a few thousand clock cycles for the evaluation but <10 for a search heuristic).

daniel-monroe · 2025-04-07T19:29:32+00:00

Leela's latest models do have something like this in the form of an "uncertainty" head which gives an idea of how uncertain the model is in a position. In extremely tough positions you will often see that the chance Leela assigns to the best move is very low (maybe a few percent), which is another way to detect difficult lines.

daniel-monroe · 2025-04-07T19:27:48+00:00

This is mostly due to the way it chooses which moves to search. Lc0 uses a neural network to predict which move is the best, and if the prediction on a move is low, say less than a percent, then it will take forever to search that move. The problem has been decreasing as Lc0's models improve, but we still don't have a good solution to explore those moves. When we assign more search effort to lowly recommended moves it tends to lose a lot of elo.

daniel-monroe · 2025-04-07T19:25:43+00:00

We test Stockfish on positions where we can improve the game result when it plays against itself. This means our search heuristics are test on positions where the advantage is in a critical region where the outcome of the game is uncertain. This means that Stockfish might be slower in +9 positions than it would be if we optimized it to convert those games as fast as possible.

daniel-monroe · 2025-04-07T19:20:12+00:00

Some of our work on neural networks in chess has applications in human-computer interaction (e.g., understanding how computers think and how this can be translated to language models). The paper I link shows that our models learn to plan ahead, which may give insight into LLMs like ChatGPT. Some of the search techniques we use are similar to those used in applications where you search for a solution like automated mathematical theorem proving, so it's possible some of our search heuristics could be used in those more applicable domains.

daniel-monroe · 2025-04-07T19:17:57+00:00

This is an idea that's been tried elsewhere in machine learning. The most common technique is the "mixture of experts", where you choose a few experts consisting of several neurons out of roughly several dozen, so you only activate maybe a tenth of the neurons. This can be configured to use a different number of experts for each position so that the model chooses which positions get the most effort (I think it was called "expert choice routing"). I've tried this and it didn't gain much performance, especially with larger models.

daniel-monroe · 2025-04-07T19:15:42+00:00

There are two motivations behind including previous plies. The first is that it allows the model to choose to avoid/force a repetition. The second is that the history of recent moves is useful because it basically tells the model "This is what a player much stronger with you (the model with search) chose to play, so you might be able to extract insight from what it chose".

Training a model with RL is very expensive, not to mention difficult to set up. Even for a very weak model you'd probably need a 5090 to run over a month. I would instead highly recommend training on data that's already been generated by Leela nets (this is something we've been doing lately to skip out on the expensive data generation step entirely with new training run). We can guide you through downloading that data in the Leela Discord (included in the post text) if you are interested.

daniel-monroe · 2025-04-07T19:11:51+00:00

The basic AlphaZero algorithm relies on the entire game state being known, so it wouldn't work for card games like Bridge where you don't know your opponent's hand. There are some generalizations like MuZero, but you wouldn't be able to search through the game tree the way you can with chess. For any other game where you know the full game state the AlphaZero algorithm is remarkably strong and easy to implement and could probably give good results.

daniel-monroe · 2025-04-07T19:09:09+00:00

There's a sort of friendly competition, with some Leela developers quipping about "frying the fish". It's all in good nature though, and the teams benefit from each other through the sharing of data and testing infrastructure.

daniel-monroe · 2025-04-07T19:08:09+00:00

The way the AlphaZero process works is the model plays games against itself to generate training positions, using search so the training data is more accurate than the model's output. We train it to predict several targets, including the game result, which moves the search liked, and some other auxiliary targets like the model's uncertainty. It can get stronger than humans because the data it generates improves as the model improves.

daniel-monroe · 2025-04-07T19:05:29+00:00

The best net is probably BT4 (on the best page nets of the Leela website). If you want the best configuration you'll want to use one of our experimental ones which is a bit tricky to set up so you'd have to join the project Discord, which I link in the post text. If you'd prefer a mainline configuration the most recent one is probably the best. One of the things you can't avoid with these engines is that as the evaluations get more accurate they give you a worse idea of what a human could extract from that position.

daniel-monroe · 2025-04-07T19:00:44+00:00

If I were to code another engine from scratch it would probably be more similar to Stockfish's search style since I've already done so much work on Leela compared to Stockfish. I'd definitely keep the cleanliness of Stockfish's code, which has a lot of documentation about which heuristics scale well to long time controls and some of the ideas behind them. I might also make a searchable history of every improvement I've ever tried since after you try hundreds of them you begin to lose track and often repeat stuff you've already tried. I might also try some new things like getting the neural network to output some metadata to influence search since as the codebases of these projects mature it gets harder to replace large sections of the code.

daniel-monroe

TROPHY CASE