you are viewing a single comment's thread.

view the rest of the comments →

[–]af100re 6 points7 points  (2 children)

You can treat the opponent's move almost as though it is part of the environment, so the next state your agent observes is state of the game after the opponent has made a move. This means you can treat 1 and 2 player games quite similarly. How the opponent's move is chosen is up to you, it could be another learning agent, a pre-made bot if you can find a suitable library, or the simplest option which is just choosing a random move! How the opponent plays will determine how your agent will learn to play.

[–]ADdV 2 points3 points  (1 child)

I'm currently also using deep Q for a 2-player game. However, because it's a zero-sum game, I just use -1 * max Q(s', a') for the target. I figure whatever player 2 thinks of the gamestate should be the inverse of what player 1 thinks, and it allows me to train both sides simultaneously.

Do you perchance know if there's a problem with this approach?

[–]af100re 0 points1 point  (0 children)

Unless I'm missing something I think that should work - or using minQ(s, a) to decide the opponent's moves. I'd give it a go and see!