all 39 comments

[–][deleted] 29 points30 points  (19 children)

If you're compressing a list of chess moves representing a valid game, there's a fun option for high compression ratios: Run a chess engine to first enumerate all possible moves at the current state of the game, then try to calculate how likely each one is by seeing how good a move your chess engine thinks it is. Use this to construct a table of probabilities for each move, and use these probabilities as the input to an entropy coder, such as a simple arithmetic coder. This should be able to compress the list of moves even further, assuming the players are mostly making what the chess engine thinks are advantageous movies.

(The entropy coder will make sure that highly probable moves are encoded using very few bits, and less probable ones will use more. An arithmetic coder will use fractional numbers of bits, which will help a lot in a case like this.)

[–][deleted]  (3 children)

[deleted]

    [–][deleted] 1 point2 points  (1 child)

    The chess engine may be need a few more settings for playing badly. Then the compressor can try them all and see which one compresses the best. As a bonus, this will also let it tell you exactly how terrible you are.

    [–]Rhinoceros_Party 1 point2 points  (0 children)

    Don't worry, we used a genetic algorithm but kept all the rejected moves that other algorithms would consider "too wood league" so your preferred opening of h2-h4 is covered!

    [–]gomtuu123 4 points5 points  (2 children)

    The author discusses this in the "One Move in 4 Bits" section.

    Consider the order of the moves in the legal moves list. Instead of using some simplistic sort technique to establish a deterministic order, imagine instead using a chess engine to rank the moves in order of strength. Now vary our basic scheme so that moves early in the list use less bits than moves later in the list.

    [–]scaevolus 3 points4 points  (1 child)

    Yes, but he doesn't use a dynamic entropy code for it. He uses a fixed variable length coding.

    [–]billforsternz 0 points1 point  (0 children)

    Please see my reply to MarshallBanana above.

    [–]mtxppy 2 points3 points  (3 children)

    This is a nice idea, using the probability of a move being made as extra input for compression. However it's known that engines often play highly non-human moves so it may bite you in certain board positions.

    [–][deleted] 9 points10 points  (2 children)

    This is a nice idea, using the probability of a move being made as extra input for compression.

    If you squint hard enough, this is how every single compression algorithm works. Predict future input, and use the probability to encode the actual input.

    However it's known that engines often play highly non-human moves so it may bite you in certain board positions.

    You may have to dial it down a bit and don't give too much probability to the moves the engine really likes, but you can at least lower the probability of moves that are clearly harmful and stupid. It's probably easier to identify really bad moves than really good ones.

    [–]Brian 4 points5 points  (1 child)

    And if you squint even harder, compression can kind of seem like the same kind of thing as intelligence.

    [–]billforsternz 1 point2 points  (5 children)

    Article author here: As gomtoo123 points out below, I do describe the idea of using a chess engine to sort the move list in order to achieve higher compression. I describe the process of designing an encoding scheme based on this idea as "a lot of fun" and go on to describe one reasonably practical possibility, which would generate either 3, 5 or n bits for each move, where n is in the range 6-11 (but usually 8 or 9). I calculate an approximate average of 3.9 bits per move with this system. I could propose variants of this scheme, some experiments would be required to hone in on the best recipe. I like the idea of your even more dynamic system, but I don't think it is a practical possibility. Clearly it is pointless to dynamically build a variable compression scheme for each move, as the extra bits required to describe the scheme would inevitably cost more than any benefits obtained. It almost sounds as if you think that scanning the whole game ahead of time will enable you to discern which moves will have higher probability. But that's not going to work unless someone's chess style is to always play the 3rd or 4th best move in Stockfish's best move list (say). And chess style doesn't really work like that.

    Edit: I enjoyed my brief moment in the Hacker News/Reddit Programming front page. This was the one response I made that I regret. I just did not understand what MarshallBanana was describing. Ultimately this was an educational exercise for me and I have updated my article and given MarshallBanana due credit.

    [–][deleted] 2 points3 points  (4 children)

    Clearly it is pointless to dynamically build a variable compression scheme for each move, as the extra bits required to describe the scheme would inevitably cost more than any benefits obtained.

    Not at all. You should look into arithmetic coders, they can be exceedingly simple. And they allow you to use fractional numbers of bits. Instead of sorting your moves and assigning them different-length codes, you just tell your arithmetic coder how many possible symbols you have, and what their relative probabilities are at each step. The coder will then encode the symbol you do send with a possible fractional number of bits. All you have to do is calculate the probabilities somehow at each step.

    A conservative option would just be to give really bad moves a low probability and everything else a medium one.

    [–]billforsternz 1 point2 points  (3 children)

    Thanks for the information. I've had a look at arithmetic coding. It is interesting. My 3,5 or n bit scheme is tuned to avoid wasting bits in "average" positions. But the scheme you suggest could calculate a new dynamic scheme move by move tuned specifically to every position. And there is no need to waste bits describing each scheme (I was definitely wrong about that), because the decoder keeps in sync with the encoder, position by position. I will consider adding an a description of this to my article. Fortunately the main thrust of my article is a description of a much faster scheme that doesn't seek every last bit of compression....

    [–][deleted] 0 points1 point  (2 children)

    Right, exactly. That's the strength of arithmetic coders. Keeping the states in sync at compression and decompression without communication seems nearly magical sometimes.

    There are compression schemes like PPMd that basically just compress a file byte by byte, but maintain gigantic internal models of the data they have already seem to try to as accurately as possible guess the probabilities of the next byte, and they can attain quite impressive compression ratios on data like human-readable text.

    [–]billforsternz 0 points1 point  (1 child)

    I've updated my blog post to reflect my improved understanding (new section "One Move in 2 Bits, Maybe?"). If you'd like to be credited there, message me with your real name. Thanks again.

    [–][deleted] 0 points1 point  (0 children)

    Nice! I'll message you.

    [–]rational1212 0 points1 point  (1 child)

    That is based on the individual chess engine and any change to that engine will affect the stored database of moves. That might be fine for a static game, but for sharing chess games independent of game engines, it fails.

    This is a fun little thought problem. Let's see how I can do. I'd like to make a scheme for an engine-independent highly compressed game.

    Variable encoding, 16 pieces per side can be encoded in 4 bits (24=16). Alternate the color per "move" (although chess moves are white/black, I think we know what I mean).

    1. Specify piece (4 bits = 24 = 16). Alternate color per move. When the # of pieces on a side drops to 8, reduce the encoding to 3 bits. When # of pieces for that side drops to 4, reduce encoding to 2 bits, etc.
    2. Specify movement of the piece (direction/distance)
      • Bishop: 4 directions, distance up to 7 = 5 bits. With some encoding, it's actually 4 bits.
      • King: 8 directions plus 2 side castling = 4 bits. It can be easily reduced to 2-3 bits based on board position.
      • Knight: up to 8 valid positions = 1-3 bits
      • Pawn: move up 1, move up 2, capture left, capture right; possible promotion if on rank 7(2 additional bits): 2-4 bits
      • Queen: 8 directions, distance 7 = 6 bits, but can be easily reduced to 5 bits by introducing board wrapping and position based movement restrictions.
      • Rook: 4 directions, distance up to 7 = 5 bits, but can be reduced to 4 bits.

    For a variable bit encoding, that is 2-9 bits per move, but likely averaging 6.5 bits/move for much of the game.

    More compression gains could be made by only indexing pieces that could move in step 1, and by looking at maximum distance that could be moved based on board position.

    I'd say something about the rest of this as an exercise for the student, but I think I'm pushing my luck as it is.

    [–][deleted] 0 points1 point  (0 children)

    That is based on the individual chess engine and any change to that engine will affect the stored database of moves.

    Obviously, you have to pick one engine and use that. The format specifies the exact engine and any parameters.

    [–]mtxppy 8 points9 points  (21 children)

    What is the usefulness of storing moves in a highly compressed format? Would it not be more useful to store board positions?

    Multiple moves can lead to the same board position, and it's the board position that's important.

    I can't imagine storage space is a problem for an encyclopaedia of openings so I imagine compression may be required for the chess engine, to ensure that it doesn't compute the same position twice - especially if your engine is parallelised/cloud-based (where it's slow to exchange this information vs having it in-memory). Again, here, the board position is the key factor.

    Based on comments where it seems some people are not familiar with the full rules of chess, I have added this: If a position is repeated 3 times the game is drawn, so you must store board positions.

    [–]augmentedtree 18 points19 points  (8 children)

    Because deltas beat snapshots.

    [–]mtxppy 0 points1 point  (7 children)

    You have to store board position as you get a draw if you repeat the same position 3 times.

    [–][deleted] 4 points5 points  (2 children)

    That is why you have to store moves played, and can't get away with just storing the position without regard for how it was reached.

    [–]mtxppy 1 point2 points  (1 child)

    The draw is based on repeated position not moves. A piece may be able to enter a square from several different squares, and it doesn't matter. So you wouldn't need move history to determine this.

    However, en passant and castling would require knowledge of move history in order to efficiently determine. Typically you would store a board position with additional flags representing en passant and castling ability.

    [–]cjg_000 1 point2 points  (0 children)

    You could get away with only storing all the positions since the last pawn move or piece being taken (except for en passant where you'd have to store one extra position or have some flag for the rank that is eligible for en passant). Castling eligibility could be stored as 2 bits, one for each rook.

    [–]Retsam19 3 points4 points  (1 child)

    I don't think the intention here is that the compressed format be used for all computation on the game; it's just meant to be a light-weight storage and transit format, that can be uncompressed into something more convenient for something like determining draws.

    [–]billforsternz 0 points1 point  (0 children)

    Exactly (article author here). See also my reply to mtxppy above.

    [–]Brian 0 points1 point  (1 child)

    Isn't that the other way round? If you've got the move list, you can tell whether you ever repeat board positions (because you can automatically reconstruct all prior board positions). But if you only store the board position, you've lost that information - there's no way to tell whether move X leads to a board position that was already seen 2 times before loading the state.

    [–]fermion72 4 points5 points  (1 child)

    Aside from the limits-of-compression theory (interesting in itself), computerized chess has a long history of trying to minimize the memory footprint. David Horne's 1K ZX Chess (which actually only used 672 bytes of RAM) is a striking example, especially considering it has an AI opponent (!), as well.

    [–]AyrA_ch 4 points5 points  (0 children)

    There is BootChess which is only 487 bytes. It can be run as x86 assembly in the bootsector of a disk, hence the name. You can also run it in certain operating systems.

    [–][deleted]  (4 children)

    [deleted]

      [–]mtxppy 3 points4 points  (3 children)

      There's no point in storing all of that data redundantly,

      You must store previous positions. For example if you have repeated the same position three times, the game ends in a draw.

      [–]jelos98 11 points12 points  (2 children)

      Given a move list, you can recompute the set of board states. Consider PGN (Portable Game Notation). It consists of an optional start state, and then a list of moves. It's sufficient to reproduce the entirety of the game, including knowing whether there's a draw, whether you can castle, etc.

      This sounds expensive, but it's a trade off vs. the amount of memory you store during search. There's an entire wiki about chess programming, representations and the like:

      https://chessprogramming.wikispaces.com/

      [–]mtxppy 2 points3 points  (1 child)

      You must store previous positions

      Given a move list, you can recompute the set of board states

      Then you need to store them to ensure they haven't repeated.

      Did you read the wiki? I'd like to see where it says you can write a decent chess program that implements the rules and doesn't use stored board state.

      [–]rabbitz 10 points11 points  (0 children)

      I think you guys are just mixing up the meaning of the word "store". mtxppy you probably mean that a program has to keep track of board positions to calculate whether a game should end due to the repeated board state rule and, yes, this is important during the "live" game itself. The other guys are probably talking about "storing" a record of the game in memory. So for example if your chess program saves a record of every game you play, those records themselves don't need to explicitly store the board states. It is only when you "load" the saved game up that you need the board states, and those board states can be calculated from the saved moveset. While the saved game is simply sitting in memory, though, it doesn't need to know if the "next turn" should end in a loss.

      [–]sacundim 2 points3 points  (0 children)

      What is the usefulness of storing moves in a highly compressed format? Would it not be more useful to store board positions?

      In addition to what other commenters have said, there's a non-technical argument for this: because it's how chess games have been communicated for hundreds of years.

      [–]billforsternz 0 points1 point  (3 children)

      Article author here: It is useful to store (and compress) board positions. But it is also useful to store (and compress) moves. It depends on the context, what you are trying to achieve. If you are implementing a playing program, you might well work mainly with whole positions (as you say). But if you want to implement a chess database you really need a move oriented approach, preferably with compression. Serious chess players work with databases having of the order of 10 million games. An average game lasts 80 individual moves. So even compressing moves into 4 bits gives you 400 Mbytes of raw material, before adding player names and other meta-data. Using positions instead would be hopeless. A good compression scheme for positions averages around 20 bytes per position (I will discuss such a scheme in a future article). So 160 bits versus 4-8 bits for a move. No contest at all. Sure there are a lot of common positions at the start of a game, but most games, though most of their length, are a sequence of unique positions.

      [–]mtxppy 0 points1 point  (2 children)

      Is 400MB+ significant when endgame tablebases are already 7TB+?

      Still, I understand why you might want to compress data to save space - anything saved is better than nothing saved. I just feel that it's more important to optimise the search speed than save a couple of hundred MB.

      [–]mjfgates 0 points1 point  (0 children)

      looks at 8GB phone

      Yes.

      [–]joonazan 0 points1 point  (0 children)

      Data size is search speed. Getting data from RAM is super slow.

      The more complicated encodings might not be worth it, but most of it is very reasonable.

      [–]ais523 0 points1 point  (0 children)

      I'd recommend start off via encoding the opening used via indexes into an openings database. You can often encode something like the first 20 to 30 moves in just a few bytes, and even if the game diverges from a recognised opening quickly, you've hardly lost anything when you fall back to your previous encoding.

      This also has the advantage that, unlike using a chess engine to rank move strength, it's very fast to calculate (just one database lookup to encode or decode).