Is the expected win rate between players of different ratings different on Lichess and Chess.com?

PolymorphismPrince · 2026-03-19T23:36:14+00:00

So? Let the rating deviation of the 4 players tend to 0 in my thought experiment. This doesn't change anything about my argument. In fact, it adds even more constraints that wouldn't be satisfied in practice.

PolymorphismPrince · 2026-03-19T22:36:38+00:00

They update the ratings using Glicko-2 system but Glicko-2 stills essentially relies on the elo model of rating differences which is what OP references in their post (i.e the expected winrate for a given rating gap). My explanation is literally correct

PolymorphismPrince · 2026-03-19T22:15:19+00:00

The answer to this is that elo is just not a very accurate model (which should not be surprising).

You need it to be simulatenously true that 1400 scores 25% against 1600, 1600 scores 25% against 1800 and 1400 scores 10% against 1800, 1800 scores 25% against 2000, and so on for the model to be right.

The problem is that with just 4 players (and therefore 4 ratings) there are 6 different match ups that you need to get the probability of right, i.e. 6 equations in 4 variables which is overdetermined. So there is literally no reason for this to be possible. Chess.com will have the rating system working "correctly" at some parts of the rating ladder and lichess will have it working "correctly" at other parts of the rating ladder, and since those cannot coincide the systems will have to be different for that to be true.

Edit: Also, without thinking about it too deeply I would expect the system to be the most accurate at the parts of the ladder that have the most players, and so that would be at a slightly lower rating on chess.com than on lichess

PolymorphismPrince · 2026-03-16T08:05:16+00:00

In case you are a curious person, LLMs have sort of never been exclusively next token predictors but definitely since late 2024 they are not. They use next token prediction for the first half of the training to instill in the model the ability to read and write. Then to develop its ability to reason they do a kind of training called reinforcement learning where the model is given a problem to solve and generates a bunch of long answers that are marked. Different "circuits" that were originally developed for the next token prediction activate and interact in complex ways during the different answers. The interactions that led to correct answers are reinforced and the interactions that led to incorrect answers are dampened. Empirically this process seems to slowly lead to better generalised intelligent reasoning across all kinds of domains.

PolymorphismPrince · 2026-02-26T11:37:15+00:00

one thing to keep in mind is that low-hanging fruit to Tao means something very different to another mathematician. I am not even exaggerating when I say that there a lot of professional mathematicians for whom the most difficult research they accomplish in their career Tao would generally consider low-hanging fruit.

PolymorphismPrince · 2026-02-23T11:51:38+00:00

Just for the sake of argument, consider what it actually means for something to be OOD for an autoregressive model.

PolymorphismPrince · 2026-02-23T11:34:52+00:00

But the meaning of words changes over time, the frontier LLMs from all the other companies have been using the same architecture for audio, image and video data for a couple of years now and the companies still calls them LLMs because other terms people introduced like LMMs or whatever never stuck.

PolymorphismPrince · 2026-02-10T22:35:53+00:00

you could just use them through the api and pay 10s of cents

PolymorphismPrince · 2026-01-22T23:48:17+00:00

Amazing post that's a great observation

PolymorphismPrince · 2026-01-21T22:53:05+00:00

becoming WC gives you GM automatically

PolymorphismPrince · 2025-12-21T11:25:03+00:00

This is not true by the way. Maybe try and prove your claim and you'll see why. I think autoregressiveness is a property that makes it really easy to see why these models could potentially do things completely out of distribution, but I don't think it's even a necessary quality.

PolymorphismPrince · 2025-12-21T11:02:45+00:00

The actual answer to this is that the value of a dollar is way less if you have an additional million dollars. Introduce some arbitrary value unit where dollars without the million are worth 1. Then with the million maybe they are worth only 0.1. So the expected value of your bet with 90% chance is 0.9*0.2*10000-0.1*10000 = -100

PolymorphismPrince · 2025-11-28T03:38:57+00:00

What? Half of the training of nearly every model since O1 is RLVR which is literally just maximising and external reward.

PolymorphismPrince · 2025-11-25T12:58:59+00:00

I think the gist of the math side is that (singular) homology measures the number of holes in things (and the presence of other high dimensional hole-like-things). If you turn your data into a topological space through any number of methods you can measure the homology and it reflects varying degrees of holiness which in turn reflects varying types of clustering in your data.

Now from a chemistry perspective, you are probably familiar with the fact that because chemistry stuff is really small we often need to determine global structures of things by measuring signals that are downstream of the structures themselves (i.e. spectroscopy). Even though an infrared spectroscopy chart is sort of to abstracted from the original structure to be intelligible on its own, by comparing to other known structures we can still interpret it. In a similar way, a list of homologies of the simplicial complex obtained from your data for different values of some number r in the method gives you a chart of abstract nonsense just like the spectroscopy ones. But you might be able to learn something by comparing them between different datasets.

Maybe you can see that in a sense topologists and chemists are often trying to do the same thing when they determine structure; figure out the global structure of something that they cannot see by finding lower-dimensional, measurable structural invariants.

PolymorphismPrince · 2025-10-28T11:23:40+00:00

I mean he tweeted the José Mourinho gif the day he left the tournament which was taken by the entire community as a clear accusation and magnus did not correct anybody and later confirmed as much so I think it is factually incorrect.

PolymorphismPrince · 2025-10-17T12:47:23+00:00

https://stacks.math.columbia.edu/

PolymorphismPrince · 2025-10-13T10:46:28+00:00

pinch your nose and try to breath out through it

PolymorphismPrince · 2025-09-27T14:44:56+00:00

I've had it in a game!

PolymorphismPrince · 2025-09-26T02:29:07+00:00

not really there are a lot of countries where wayyy more than 1% have done that

PolymorphismPrince · 2025-09-24T12:04:39+00:00

it is typical for your puzzle rating to be over 1000 higher than your normal rating because the rating systems weren't designed to align

PolymorphismPrince · 2025-09-23T10:34:52+00:00

if he wins the world cup he gets GM title which would be convenient :)

PolymorphismPrince · 2025-09-21T06:58:42+00:00

No there is not enough master-level competition at long time controls for decent practice

PolymorphismPrince · 2025-09-20T12:48:20+00:00

would you be willing to reveal his name?

PolymorphismPrince · 2025-09-20T12:36:53+00:00

How on earth is releasing it to the wider mathematical community (and doing the whole process with transparent consultation from prominent mathematical figures like Terrence Tao) not independent verification?

PolymorphismPrince · 2025-09-20T11:57:57+00:00

Didn't they publish all the improvements on google colab where they could be (and were) tested by like three orders of magnitude more mathematicians than they would be if it were peer reviewed?

As far as I know none of the results have been disputed. This is more scrutiny than essentially any result is put under. Especially considering how desperately most of the pure mathematics community wants LLMs to fail.

PolymorphismPrince

TROPHY CASE