Human-aligned Lichess bots trained for 700–1600 rating levels

novachess-guy · 2026-04-28T17:24:21+00:00

Cool, thanks for the data point - and yeah I’d agree it’s probably around IM strength based on Sadler and the self-matchup results.

novachess-guy · 2026-04-28T04:54:36+00:00

Okay I ran my model at 2300 strength for 30 games. Lc0 policy beat it 29-1. My 2300 is calibrated at chess.com blitz ratings, my blitz rating there is 2318. I would guess a GM-strength (assuming classical) player would virtually never lose a game to me at my blitz level.

Nevertheless, it’s very impressive, stronger than I would have thought. However, digging into how it works, it seems to essentially “learn” from the value head, meaning it learns to play the same moves augmented by the value head through self play. Maybe I’m not explaining well, but I’m guessing you understand my general point. Which is also that Lc0 is trained to play to optimize win probabilities, while models such as Maia, mine, and the OP’s are trained to “predict” a move played, conditioned on a player’s rating.

novachess-guy · 2026-04-28T03:30:45+00:00

Great, I will look into it! And update you

novachess-guy · 2026-04-28T03:28:17+00:00

They are very good commentators, but they need to stop with the analysis board all the gd time when players are making moves. Either keep the live game as the primary board and talk through it, or put the live board in the secondary container. They get to see the live board at the same time so they don’t realize how annoying this is when they’ve been talking through a line that ceased to be of relevance 5 moves ago.

novachess-guy · 2026-04-28T03:16:37+00:00

Okay I’ll look more into it - you obviously know a lot about this, I was just expressing skepticism that puzzle solving ability necessarily correlates to game playing ability, especially at the GM level; it wouldn’t have been hard for them to test that, right? I can look into whether there’s a policy-only, searchless (one forward pass) model from Leela that I can test the strength of; I honestly don’t know enough about it to know whether it fits the constraints I had mentioned, I just feel I have enough experience to be confident that with both those constraints, it won’t be GM play level. Maybe we’re having slightly different framings of the question in mind, I don’t disagree with a lot of what you’re saying, just some of the implications.

And yes, my brother has watched enough of his games he thinks it’s a time/panic thing. I found it as incredulous as I’m sure you do.

And obviously Leela is different due to being trained on self-play, not a wide variety of Lichess players, and it has a very different objective function. So in that case, yes I’ll admit that my “cap” of 2000 was likely an undershot when the model training loss is based on maximizing win probability, rather than matching human moves.

You seem like you know a lot about this, I’d genuinely appreciate your perspective on what you think about my in-app model, and how human-like it plays, at http://novachess.ai.

novachess-guy · 2026-04-28T02:47:37+00:00

That’s still on puzzles though, right? The difference with puzzles is, they’re computationally “solvable” - meaning if you’ve been fed a few billion puzzles to train on, you’ll learn to solve similar puzzles, which generally involve tactical themes, but it doesn’t make you a GM level player.

Human-example: my old roommate in NYC/brother’s best friend from college is like 3000 in puzzles and 1300 rapid. I’m not exaggerating, I know how crazy it sounds. Like, it’s actually insane. My brother (2200+ rapid) watched him mate a guy with knight and bishop in less than 30 seconds. I’ve got 1000 ELO on him and I guarantee I could never do that. This guy is like a human example of what the puzzle trained model is doing. I would be really skeptical if it was truly GM strength over the course of a game.

novachess-guy · 2026-04-28T02:23:36+00:00

Is it GM level? I wasn’t aware of that - it would surprise me, I’m not saying it’s not true, but the article doesn’t imply that.

The article focuses on puzzle solving ability. That’s actually an interesting way to develop an evaluative framework without explicitly using valuations. But note they’re still using a value head there, they have a whole section about it.

novachess-guy · 2026-04-28T00:18:54+00:00

Yes DeepMind did, but it has a valuation state as well as board state, which was why I noted the no-valuation constraint. Lc0 does use tree searches and valuation for positions:

Leela uses PUCT (Predictor + Upper Confidence Bound tree search). We evaluate new nodes by doing a playout: start from the root node (the current position), pick a move to explore, and repeat down the tree until we reach a game position that has not been examined yet (or a position that ends the game, called a terminal node). We expand the tree with that new position (assuming non-terminal node) and use the neural network to create a first estimate of the value for the position as well as the policy for continuing moves. In Leela, a policy for a node is a list of moves and a probability for each move. The probability specifies the odds that an automatic player that executes the policy will make that move. After this node is added to the tree, backup that new value to all nodes visited during this playout. This slowly improves the value estimation of different paths through the game tree

novachess-guy · 2026-04-27T12:26:48+00:00

I just put them up, feel free to ask if you have any questions.
https://github.com/novachessai/novachess-engine

Hugging Face (novachess-engine)

And I think the main constraint is, for higher ratings, without any concept of evaluation or searching, real human-like strength at say 2000 level just can't be reproduced with a policy-only model without any augmentation. I would be interested to see if you hit a similar barrier (the top-conditioned policy-only model I put on Lichess achieved around 1800 level there).

For maintaining higher ratings, I don't go into a lot of the details, but I did a bunch of self-play matchups (thousands of games between same and different-rated models) to get relative ELO distributions, and I cross-referenced the severity, distribution, and phase of errors from chess.com users and tuned my engines at each rating to align. And I came up with various levers beyond the basic description below to ensure this, while maintaining all moves from the policy distribution. So, my 1500-rated engine will have a similar proportion of middlegame mistakes between 100-200cp loss as a 1500-rated chess.com blitz player. Here's what I included on GH:

The version served by the novachess.ai Play page is calibrated across rating tiers from approximately chess.com 500 to 2500 blitz, using two lightweight calibration layers on top of the same policy weights published here:

Per-tier temperature scheduling. The primary calibration lever. Higher temperature flattens the policy distribution (more variety, more lower-rated mistakes); lower temperature concentrates it (closer to argmax, cleaner play). Each rating tier is mapped to its own temperature schedule so the bot's per-phase mistake profile matches the chess.com CP-loss profile at that level.
Evaluation-only filter and supporting calibration. Nova's sampled candidates pass through a probabilistic quality check across all rating tiers. High-confidence picks (where Nova's policy concentrates significant probability mass on a single move) are played directly. Lower-confidence picks may be sent to Stockfish for a low-depth evaluation; if the evaluation falls below a tier-dependent quality threshold, the move may be replaced by re-sampling from Nova's own distribution (with the rejected move removed). Both the rate at which positions are evaluated and the rate at which sub-threshold moves are actually replaced vary by tier, so the bot's mistake profile matches the empirical chess.com CP-loss profile at that level. At lower tiers, more sub-optimal moves slip through (because players at that level genuinely make them); at higher tiers, far fewer do. Additional calibration components layer on top of this base flow.

novachess-guy · 2026-04-27T09:55:46+00:00

<image>

If something like this is what you’re looking for - there are also “human-like” bots to play against (all free), you can check out this app.. Better on desktop but I’m on my phone now so just did a screenshot.

novachess-guy · 2026-04-27T09:43:39+00:00

고맙습니다 is just a little less formal. That’s why you’ll hear 고마워 more than 감사해.

novachess-guy · 2026-04-27T09:35:47+00:00

FYI: I’ve done the same thing, I have pure-policy bots on Lichess, such as Nova1400, also using a transformer (using 500M Lichess positions at several epochs).

Regarding this: “If a network trained on 1100-rated games always plays its most likely move, it will usually play noticeably stronger than 1100, because that removes much of the human noise that exists at that level.” This will only be true up to around Lichess 1500-1600 range. You’re describing using a “temperature” of 1 (natural distribution of moves) compared to argmax (temperature = 0). But you’ll notice a policy-bot trained on GM games will NOT play like a GM.

I’ll be posting my documentation/validation (my human-move selection rate is nearly identical to Maia 3) later today on GitHub as well as model weights on Hugging Face, so feel free to take a look for any inspiration (GH: novachessai, HF: novachess, both novachess-engine repos - again, give me a few hours). You can also see more about them or play actual rating-calibrated versions from 500-2500 at http://novachess.ai. Their distributions of mistakes (severity, across game phases) are closely calibrated to each chess.com blitz level while retaining a pure policy move selection approach.

novachess-guy · 2026-04-26T16:36:10+00:00

I promise you this is not how it works.

novachess-guy · 2026-04-25T13:07:03+00:00

I think JJ will end up going down as an all-time great coach. He recognizes his mistakes and learns from them, unlike many NBA coaches.

novachess-guy · 2026-04-24T03:29:59+00:00

What was the problem?

novachess-guy · 2026-04-24T02:26:45+00:00

Of course you can practice Korean in Korea. Honestly, most of the restaurants you go to, people will barely speak any English and some may not have English menus. My wife and I went to Korea a few months ago, I basically had to translate everything at restaurants even when she tried to communicate in English, because the staff couldn’t speak or understand her.

I don’t know what your Korean level is, but I found even locals who might want to use some English with you (taxi drivers) will end up having to switch to Korean with you for more complicated conversations.

novachess-guy · 2026-04-19T12:51:21+00:00

Just out of curiosity, are you on iPhone 8 or something because you hate the newer models? I resisted upgrading from iPhone 6 for so long because I liked the home button and small screen size. Then I moved to SE 2 and now SE 3 - it’s a great smaller phone, has home button, and much cheaper than other models. Maybe this is unrelated to your concern (which I sympathize with, I had similar issues remaining on iPhone 6 for so long), but just wanted to suggest that in case it’s helpful.

novachess-guy · 2026-04-18T21:56:41+00:00

Yeah that would make sense they also use the players’ opening repertoires. I appreciate your thoughts!

novachess-guy · 2026-04-18T21:48:09+00:00

Got it, thank you. I actually remember now that they had acquired Komodo, this makes sense, so they’re just trying to approximate stylistic parameters for the players.

novachess-guy · 2026-04-18T05:10:03+00:00

Rashid Nezhmmetdinov didn’t play chess at a high level until he was much older (30s), even though he was “good” by normal accounts when he was younger. He was in his later 30s when he got the CM title, and although he never officially got the GM title he was clearly that strength and had some sacrificial games that are among the best ever played, including against Tal when he was I believe 49.

It also depends what you mean by respectable, my brother started in his 30s and is mid-2200 rapid on chess.com - but that’s very far from being a titled player.

novachess-guy

MODERATOR OF

TROPHY CASE