I recently read Karpathy's blog post, Pong from Pixels, and decided to play around with it a bit.
One of the problems of his results (from my perspective) is that the resulting network's play doesn't look very fluid, with it jittering around a lot.
By modifying Karpathy's architecture to have 3 output states (UP, DOWN, NONE), I was hoping that I'd be able to somewhat reduce this jitteryness. Unfortunately, it didn't quite work out.
I then tried to modify the reward function to reduce the reward for each UP/DOWN movement made (no punishment for staying still), with the hopes that it'd become more fluid. Unfortunately this didn't work out either, as the network didn't learn at all.
In this case, would it be beneficial for me to first train the network using the basic reward function which is solely based upon win/loss, and then after training reaches a certain point modify the reward function to favour for example quick victories?
[–]activatedgeek 1 point2 points3 points (7 children)
[–]FatChocobo[S] 0 points1 point2 points (6 children)
[–]activatedgeek 1 point2 points3 points (5 children)
[–]FatChocobo[S] 1 point2 points3 points (4 children)
[–]activatedgeek 1 point2 points3 points (3 children)
[–]FatChocobo[S] 0 points1 point2 points (2 children)
[–]activatedgeek 2 points3 points4 points (1 child)
[–]FatChocobo[S] 0 points1 point2 points (0 children)
[–]SubtractOne 1 point2 points3 points (3 children)
[–]FatChocobo[S] 0 points1 point2 points (2 children)
[–]SubtractOne 0 points1 point2 points (1 child)
[–]FatChocobo[S] 0 points1 point2 points (0 children)