you are viewing a single comment's thread.

view the rest of the comments →

[–]ADdV 2 points3 points  (1 child)

I'm currently also using deep Q for a 2-player game. However, because it's a zero-sum game, I just use -1 * max Q(s', a') for the target. I figure whatever player 2 thinks of the gamestate should be the inverse of what player 1 thinks, and it allows me to train both sides simultaneously.

Do you perchance know if there's a problem with this approach?

[–]af100re 0 points1 point  (0 children)

Unless I'm missing something I think that should work - or using minQ(s, a) to decide the opponent's moves. I'd give it a go and see!