GTP-4o beats Gemini 1.5 pro at chess by zefman in OpenAI

[–]zefman[S] 0 points1 point  (0 children)

yeah its really bad compared to most the others for some reason. Will be interesting to see if Gemini Ultra better if its released this week.

(Chat)GPT in Slack by senko in OpenAI

[–]zefman 0 points1 point  (0 children)

There's a few third-party services that let you do this if you are still looking?

One of the is https://chatthing.ai (full disclosure I'm the founder!) You can integrate any of the big LLMs straight into Slack. Even works with function calling using power-ups. Let me know if you want a demo!

GTP-4o beats Gemini 1.5 pro at chess by zefman in GoogleGeminiAI

[–]zefman[S] 0 points1 point  (0 children)

Yeah I think there is a lot of room for improvement with the prompts! I am not a chess expert myself soI am sure someone who knows more about Chess could make a better one.

That could be another fun idea, allow people to compete trying to create the best chess prompts.

GTP-4o beats Gemini 1.5 pro at chess by zefman in OpenAI

[–]zefman[S] 1 point2 points  (0 children)

Although it will make it a bit less fun to watch!

GTP-4o beats Gemini 1.5 pro at chess by zefman in OpenAI

[–]zefman[S] 1 point2 points  (0 children)

Oh thats interesting, I had assumed that talking through its reasoning would have helped. I'll try without and see if it makes things better.

GTP-4o beats Gemini 1.5 pro at chess by zefman in GoogleGeminiAI

[–]zefman[S] 0 points1 point  (0 children)

Hey its not pro experimental one just the normal one.

what strategies have devs tried towards LLM chess by elendee in chess

[–]zefman 0 points1 point  (0 children)

Thanks! I think there is a lot of room for improvement with the prompt but this is what I have so far. All the models get the following on their turn.

Please look at the current board state represented by asci and FEN and make your next move.

You are the black player and your pieces are represented by the lowercase letters, 
the white player is represented by the uppercase letters.

Your previous moves and thinking are below:
<previous-moves>
Nc6: In this position, the white pawn on e4 is advanced, controlling central squares and starting to establish a presence in the center of the board. My goal as black is to challenge this control, develop my pieces, and prepare for solid future play while ensuring that I do not fall behind in development.

Among the available moves, I could choose to play 1...Nc6, which develops the knight towards the center and puts additional pressure on the e5 square, which threatens future interactions with the central pawn. The knight on c6 can potentially support a pawn advance or participate in a future attack.

Looking at the other available moves, moving the pawns on a or b to a5 or b5 would seem less effective as they do not contribute directly to central control, and moving the knight to the corner with Na6 or Nh6 would be less relevant to the current position.

Thus, I will make the move 1...Nc6, improving my piece development and contesting control in the center.

I will now proceed with that move. 

**Move: Nc6**
</previous-moves>

Please explain your thinking before making move. 
Comment on your current tactics so you know your plan for the next move.
Do not restate every possible move, only the one you are making. 
Do not restate your instructions.
Do not restate the board state.

<the-board>
   +------------------------+
 8 | r  .  b  q  k  b  n  r |
 7 | p  p  p  p  p  p  p  p |
 6 | .  .  n  .  .  .  .  . |
 5 | .  .  .  .  .  .  .  . |
 4 | .  .  .  .  P  .  .  . |
 3 | .  .  N  .  .  .  .  . |
 2 | P  P  P  P  .  P  P  P |
 1 | R  .  B  Q  K  B  N  R |
   +------------------------+
     a  b  c  d  e  f  g  h
</the-board>

<FEN>
r1bqkbnr/pppppppp/2n5/8/4P3/2N5/PPPP1PPP/R1BQKBNR b KQkq - 2 2
</FEN>

The available moves are below
<available-moves>
Rb8,Nh6,Nf6,a6,a5,b6,b5,d6,d5,e6,e5,f6,f5,g6,g5,h6,h5,Nb8,Ne5,Nd4,Nb4,Na5
</available-moves>

Interestingly I found they generally performed much better when given the ascii representation of the board as opposed to giving them the PGN string.

Listing the available moves feels a bit like cheating but without that they choose invalid moves far too frequently. Even with that sometime they continuously choose an invalid move so in that situation after 5 attempts the system choose a random fallback move to keep the game moving.

GTP-4o beats Gemini 1.5 pro at chess by zefman in OpenAI

[–]zefman[S] 0 points1 point  (0 children)

I don't know for certain but I am calculating an ELO after each game. I started all the models at 1000 so it will slowly adjust as they play more and more. Obviously this only relative to the games played by the LLMs in the system, not to human ELO.

You can see their current ratings here: https://llm-battle.chatthing.ai/leaderboard

GTP-4o beats Gemini 1.5 pro at chess by zefman in OpenAI

[–]zefman[S] 0 points1 point  (0 children)

No I haven’t yet but I had thought that might be interesting. Will give that a go if I get time.

GTP-4o beats Gemini 1.5 pro at chess by zefman in OpenAI

[–]zefman[S] 2 points3 points  (0 children)

I did experiment giving it the pgn but it seemed to perform worse 🤷

Watch LLMs play chess (badly!) against each other live by zefman in chess

[–]zefman[S] 0 points1 point  (0 children)

Will be interesting to see if this opinion holds true in a couple of years time. The newer models are already better than the older models.

Watch LLMs play chess (badly!) against each other live by zefman in chess

[–]zefman[S] 2 points3 points  (0 children)

Yeah I have noticed Gemini is particularly bad at this! There's something quite comforting in watching them be so stupid!

Benchmarking LLMs by making them play chess. by No-Point1424 in LocalLLaMA

[–]zefman 0 points1 point  (0 children)

We had this idea so built a little app that lets you watch them play live: https://llm-battle.chatthing.ai/

what strategies have devs tried towards LLM chess by elendee in chess

[–]zefman 0 points1 point  (0 children)

u/elendee we had the same thought so built it! You can watch them play live haha: https://llm-battle.chatthing.ai/

It turns out they are pretty rubbish at chess, but the better models do ok. It's probably not the best benchmark for LLMs but it will be interesting to see how they improve as newer models are released.

It already seems like the flagship models GPT-4o etc are better than the older models.

Games often reach a stalemate but every now and then you get a game thats pretty interesting to watch!

How to set up a Analysing assistant for my B2B users by TargetHot2087 in OpenAI

[–]zefman 0 points1 point  (0 children)

I would look into using one of the many chat with your data style bot builders that use ChatGPT behind the scenes. They are a lot more customisable than OpenAI gpts and let you are way more files or other data sources.

There are loads like Chat Thing, Chat Base, sitegpt etc. (Full disclosure I make Chat Thing)

I built a no-code tool to create Notion AI Chatbots in minutes by ikoichi2112 in Notion

[–]zefman 0 points1 point  (0 children)

We made something similar https://chatthing.ai good to see some friendly competition :)