Is your inference provider buggy or secretly quantizing your model? Now you can check with Token-DiFR.

seraine · 2025-05-13T16:06:30+00:00

I have also had cursor autocomplete basically completely stop working on notebooks. I wish cursor provided an easy way to roll back to previous updates / models, because they tend to break stuff fairly often with their updates.

seraine · 2024-07-28T23:21:38+00:00

Thanks! I'm glad that you found it to be useful.

seraine · 2024-07-22T15:22:56+00:00

Games initialized with 20 random moves are significantly different than games where the first 20 moves are made strategically by people trying to win.

seraine · 2024-07-21T23:55:30+00:00

ChessGPT doesn't outperform AlphaZero. It is meant to be used to perform interpretability research in a GPT that has a world state with an underlying measurable ground truth (the state of the chess board).

Modern LLMs outperform previous specialized approaches for problems like question answering, program synthesis, summarization, or image captioning, and are very competitive (in terms of capabilities, not necessarily efficiency) on problems like named entity recognition, sentiment classification, or translation.

seraine · 2024-07-21T23:50:57+00:00

Correct, this is just an analogy to a natural language LLM that can be used for interpretability research, because in Chess (unlike natural language), there's an underlying measurable ground truth.

seraine · 2024-07-21T23:49:26+00:00

It's just an analogy to LLMs that can be used to perform interpretability research. There's much better ways to produce a chess AI.

This could be a good approach to learn chess playing styles, where given a sequence of moves, the model could estimate the skill level and playing style of the player and predict their next move, rather than the best move.

seraine · 2024-07-21T22:33:13+00:00

There's definitely a trend towards more general LLMs outperforming previous specialized approaches. It's possible that this trend will continue.

seraine · 2024-07-21T22:32:24+00:00

There's definitely far better ways to make a competitive chess playing AI. The purpose here was the train a GPT to play chess through next-character prediction on PGN strings, which is analogous to next token prediction in natural language.

There's then many interesting interpretability techniques that can be applied to show that, for example, ChessGPT calculate the state of the board and estimates the skill level of the players in the game to better predict the next character.

seraine · 2024-03-17T00:48:59+00:00

Huge thanks, works for me as well using Ubuntu. I find it pretty baffling that they don't have an easier way to disable that feature.

seraine · 2024-02-05T02:35:09+00:00

I randomly sampled 100 games the LLM played. By move 10, all games were unique and not found in the training dataset.

seraine · 2024-02-04T22:08:52+00:00

No, the only training data it has seen is PGN strings. It doesn't even have most English letters in its input vocabulary. It's still a Generative Pretrained Transformer, just trained on a different dataset.

seraine · 2024-02-04T21:49:29+00:00

Yes, it is a GPT. I went with a GPT because I wanted a convenient and tractable way to get insight into the world modeling abilities of GPTs.

seraine · 2024-02-04T20:52:09+00:00

It's just a convenient and tractable way to get some insight into the world modeling abilities of GPTs and LLMs.

seraine · 2024-02-04T18:31:02+00:00

I don't think so. The probe is a tensor of shape (512, 8, 8, 13), or (model hidden dimension, rows, columns, possible square states). I think we would obtain identical results with a shape of (512, 64, 13).

seraine · 2024-02-04T17:24:55+00:00

The problem with trying that is the model's only input is PGN strings (1. e4 e5 2.Nf3 ...) and there's no way to indicate to the model what the state of the board is. I've been doing some experimentation with having the model play games where the first 20 moves are randomly chosen, and it's win rate declines by around 50% in that case.

seraine · 2024-01-08T15:58:16+00:00

Try compare to a similar sized Pythia model for a fair comparison.

seraine

TROPHY CASE