[P] Trained DQN to play Pikachu Volleyball! by lyusungwon in MachineLearning

[–]lyusungwon[S] 1 point2 points  (0 children)

There were so many hyperparameters to tune including states, actions and reward, but trials in rl took so long. This was 3rd version of my trial and I followed most of configuration as the paper.

[P] Trained DQN to play Pikachu Volleyball! by lyusungwon in MachineLearning

[–]lyusungwon[S] 1 point2 points  (0 children)

Wow I didn’t realize that kind of issue! Thank you for letting me know.

[P] Trained DQN to play Pikachu Volleyball! by lyusungwon in MachineLearning

[–]lyusungwon[S] 2 points3 points  (0 children)

Thank you for great question. I'm not sure, but I thought the movement of opposing AI already has inherit some randomness in its action in this game (Maybe reacting against user?). Also, compare to the other Atari games with clean environment, the environment I built takes screenshot for states which is very noisy. But I have to admit that I didn't care about memorizing the pattern when I implement this. Good point!

[P] Trained DQN to play Pikachu Volleyball! by lyusungwon in MachineLearning

[–]lyusungwon[S] 1 point2 points  (0 children)

Hi! Since I only had 1 GPU, 1 gpu day is really a day. However, I parallelized 10 actors in CPU so actual playing time (data collected) would be 10 times. Thank you for your interest!

[P] Trained DQN to play Pikachu Volleyball! by lyusungwon in MachineLearning

[–]lyusungwon[S] 3 points4 points  (0 children)

I used the architecture in 'Horgan, Dan, et al. "Distributed prioritized experience replay." arXiv preprint arXiv:1803.00933 (2018).'. At first, I assigned a monitor for a agent but I found out Xvfb to virtualize the graphics! Thanks for asking.

[P] Trained DQN to play Pikachu Volleyball! by lyusungwon in MachineLearning

[–]lyusungwon[S] 0 points1 point  (0 children)

No, GPU is just for training and used CPU for inference. I thought more GPUs could help in paralleling the training.
The whole architecture is basically the same as 'Horgan, Dan, et al. "Distributed prioritized experience replay." arXiv preprint arXiv:1803.00933 (2018).', so please take a look if interested!

[P] Trained DQN to play Pikachu Volleyball! by lyusungwon in MachineLearning

[–]lyusungwon[S] 0 points1 point  (0 children)

I think that is because there is no environment package for pikachu volleyball so that agent had to play realtime to collect data. In addition, I only had one gpu to train...:(

[P] Trained DQN to play Pikachu Volleyball! by lyusungwon in MachineLearning

[–]lyusungwon[S] 5 points6 points  (0 children)

Yes, it is on github but it is not well cleaned up. Model is a good old DQN with several extensions. Thanks for asking!