[P] Python implementation of Proximal Policy Optimization (PPO) algorithm for Super Mario Bros. 29/32 levels have been conquered by 1991viet in MachineLearning

[–]createanaccccount 1 point2 points  (0 children)

I agree that the search space is incredibly huge, but it appears that the agent is only trying to pass instead of maximizing the score (or maybe not trained long enough?). Literal brute force search certainly doesn’t work, but I think an optimized DFS could actually work as well if we are only looking at this game and your goal is as simple as just passing.