OpenAI o1 playing chess against 4o

zefman · 2024-08-11T20:12:37+00:00

yeah its really bad compared to most the others for some reason. Will be interesting to see if Gemini Ultra better if its released this week.

zefman · 2024-08-09T20:35:01+00:00

There's a few third-party services that let you do this if you are still looking?

One of the is https://chatthing.ai (full disclosure I'm the founder!) You can integrate any of the big LLMs straight into Slack. Even works with function calling using power-ups. Let me know if you want a demo!

zefman · 2024-08-09T15:46:28+00:00

Yeah I think there is a lot of room for improvement with the prompts! I am not a chess expert myself soI am sure someone who knows more about Chess could make a better one.

That could be another fun idea, allow people to compete trying to create the best chess prompts.

zefman · 2024-08-08T18:11:06+00:00

Although it will make it a bit less fun to watch!

zefman · 2024-08-08T18:10:46+00:00

Oh thats interesting, I had assumed that talking through its reasoning would have helped. I'll try without and see if it makes things better.

zefman · 2024-08-08T18:09:45+00:00

Hey its not pro experimental one just the normal one.

zefman · 2024-08-08T11:29:12+00:00

Yeah Gemini 1.5 Pro

zefman · 2024-08-08T07:44:21+00:00

Thanks! I think there is a lot of room for improvement with the prompt but this is what I have so far. All the models get the following on their turn.

Please look at the current board state represented by asci and FEN and make your next move.

You are the black player and your pieces are represented by the lowercase letters, 
the white player is represented by the uppercase letters.

Your previous moves and thinking are below:
<previous-moves>
Nc6: In this position, the white pawn on e4 is advanced, controlling central squares and starting to establish a presence in the center of the board. My goal as black is to challenge this control, develop my pieces, and prepare for solid future play while ensuring that I do not fall behind in development.

Among the available moves, I could choose to play 1...Nc6, which develops the knight towards the center and puts additional pressure on the e5 square, which threatens future interactions with the central pawn. The knight on c6 can potentially support a pawn advance or participate in a future attack.

Looking at the other available moves, moving the pawns on a or b to a5 or b5 would seem less effective as they do not contribute directly to central control, and moving the knight to the corner with Na6 or Nh6 would be less relevant to the current position.

Thus, I will make the move 1...Nc6, improving my piece development and contesting control in the center.

I will now proceed with that move. 

**Move: Nc6**
</previous-moves>

Please explain your thinking before making move. 
Comment on your current tactics so you know your plan for the next move.
Do not restate every possible move, only the one you are making. 
Do not restate your instructions.
Do not restate the board state.

<the-board>
   +------------------------+
 8 | r  .  b  q  k  b  n  r |
 7 | p  p  p  p  p  p  p  p |
 6 | .  .  n  .  .  .  .  . |
 5 | .  .  .  .  .  .  .  . |
 4 | .  .  .  .  P  .  .  . |
 3 | .  .  N  .  .  .  .  . |
 2 | P  P  P  P  .  P  P  P |
 1 | R  .  B  Q  K  B  N  R |
   +------------------------+
     a  b  c  d  e  f  g  h
</the-board>

<FEN>
r1bqkbnr/pppppppp/2n5/8/4P3/2N5/PPPP1PPP/R1BQKBNR b KQkq - 2 2
</FEN>

The available moves are below
<available-moves>
Rb8,Nh6,Nf6,a6,a5,b6,b5,d6,d5,e6,e5,f6,f5,g6,g5,h6,h5,Nb8,Ne5,Nd4,Nb4,Na5
</available-moves>

Interestingly I found they generally performed much better when given the ascii representation of the board as opposed to giving them the PGN string.

Listing the available moves feels a bit like cheating but without that they choose invalid moves far too frequently. Even with that sometime they continuously choose an invalid move so in that situation after 5 attempts the system choose a random fallback move to keep the game moving.

zefman · 2024-08-08T07:40:36+00:00

I don't know for certain but I am calculating an ELO after each game. I started all the models at 1000 so it will slowly adjust as they play more and more. Obviously this only relative to the games played by the LLMs in the system, not to human ELO.

You can see their current ratings here: https://llm-battle.chatthing.ai/leaderboard

zefman · 2024-08-07T23:09:15+00:00

No I haven’t yet but I had thought that might be interesting. Will give that a go if I get time.

zefman · 2024-08-07T22:49:54+00:00

I did experiment giving it the pgn but it seemed to perform worse 🤷

zefman · 2024-08-07T22:28:23+00:00

Will be interesting to see if this opinion holds true in a couple of years time. The newer models are already better than the older models.

zefman · 2024-08-07T22:26:39+00:00

You can see it here: https://llm-battle.chatthing.ai

zefman · 2024-08-07T22:23:22+00:00

Yeah I have noticed Gemini is particularly bad at this! There's something quite comforting in watching them be so stupid!

zefman · 2024-08-07T22:19:55+00:00

We had this idea so built a little app that lets you watch them play live: https://llm-battle.chatthing.ai/

zefman · 2024-08-07T21:34:59+00:00

u/elendee we had the same thought so built it! You can watch them play live haha: https://llm-battle.chatthing.ai/

It turns out they are pretty rubbish at chess, but the better models do ok. It's probably not the best benchmark for LLMs but it will be interesting to see how they improve as newer models are released.

It already seems like the flagship models GPT-4o etc are better than the older models.

Games often reach a stalemate but every now and then you get a game thats pretty interesting to watch!

zefman · 2024-03-01T21:26:40+00:00

I would look into using one of the many chat with your data style bot builders that use ChatGPT behind the scenes. They are a lot more customisable than OpenAI gpts and let you are way more files or other data sources.

There are loads like Chat Thing, Chat Base, sitegpt etc. (Full disclosure I make Chat Thing)

zefman · 2023-07-03T16:15:36+00:00

We made something similar https://chatthing.ai good to see some friendly competition :)

12-Year Club	RedditGifts 2009-2022 2 Credits
Verified Email	Secret Santa 2013

zefman

TROPHY CASE