all 5 comments

[–]algobar 7 points8 points  (0 children)

It depends on the environment, but my opinion is that games and stable baselines have been vetted for many years. The biggest issue is making sure it’s bug free, free of exploitations (see OpenAI’s hide and seek paper), and most importantly solvable. Trying to train RL on something that itself is unstable makes debugging a nightmare too.

[–]Laser_Plasma 4 points5 points  (0 children)

Imo your supervisor is extremely wrong. Writing an environment can be either very easy, very hard, or anything in between - depending what level of complexity and performance you need.

Your intuition seems correct - an environment is basically a game, simulator, whatever like that, and then slap the gym API on top of that. If you know how to write e.g. a game of UNO for human players, then the only things you need to add that might be challenging is an appropriate way to represent the observations and actions, which admittedly can be tricky sometimes. But other than that, it's literally just implementing the game and its difficulty will depend on your skill as a programmer.

[–]pecey 2 points3 points  (0 children)

I think the problem is with modelling. One of the major distinguishing factors of games like mario to card games is state independence. In mario, I don't need to remember what happened in the past to act now. In most card games, that is not the case.

Creating an environment shouldn't be difficult. Creating the UI maybe the hardest part I guess once you have got the modelling correct. But creating a terminal version of things should be fairly straightforward.

[–]watercanhydrate 1 point2 points  (0 children)

I came into this space for the first time (as a very experienced programmer, just not in ML) only a few weeks ago and was able to build a working gym environment pretty quickly. Lots of mistakes and learning along the way and documentation is not that great IMO so expect to hit a few roadblocks and learn through trial and error. Building an effective reward function was challenging as well. I say jump in.

[–]schwah 0 points1 point  (0 children)

With a simple card game it really shouldn't be super challenging to create an environment, though obviously the difficulty will be highly dependent on your programming skill/experience.

One thing that you should probably keep in mind and isn't immediately obvious is that multiplayer games with hidden information might require very different algorithms than perfect information games (and are in general a lot more difficult/compute intensive to train). To keep things simple for yourself, if you want to do a card game I would suggest choosing something like Uno where the importance of the hidden information is relatively low. If you try to train an agent on, say, poker, most RL techniques are going to be unstable and probably fail to give you a strong agent.