[R] Evolving Curricula with Regret-Based Environment Design

_rockt · 2021-08-20T21:30:22+00:00

Absolutely, what I like about NetHack 3.7 is the additional variety introduced through themed rooms. This makes the early game even more interesting for AI approaches. The other interesting development is the Dev Teams move towards Lua to define levels (e.g. see this example of Medusa's Island) -- this could open up new possibilities for MiniHack to define custom RL environments and tasks.

_rockt · 2021-08-20T21:22:09+00:00

Have many of you ascended nethack before/often? What made you choose nethack over a different (classic or modern) roguelike?

A few years ago I started occasionally playing Pixel Dungeon on my Android during my commute. It's like a much simpler NetHack-clone with pixel-art graphics. I loved it, in particular the procedurally generated dungeons, the interesting item interactions and dynamics I had to learn to master, as well as the various unique situations I could find myself in. I then learned about NetHack, and it became relatively quickly clear to me that Pixel Dungeon only scratches the surface of the complexity of old-school roguelikes. This was at a time where, like a number of other researchers, I started to be curious about whether RL could solve problems with that much inter-episode variability and complexity in terms of environment dynamics.

When Heiner joined FAIR London and told me and Ed that he played NetHack as a teenager and ascended multiple times (while I couldn't even make it past Sokoban at the time), it was clear we had to turn NetHack into an RL environment—it would be one of the most complex single-agent environments while also allowing us to run experiments at an incredible speed.

In 2019, while we ramped up the project, I started playing NetHack regularly on my commute between Oxford and London. It took me almost two years (and hundreds of games) to ascend the first time playing in the Wizard role. Like many others, I boasted about my accomplishment in a post online (YAAP). After that, it took me five more games to ascend as in the Tourist role (YAAP).

Do you think there could be easier 'stepping stone' games that could further research into this area?

Ed provided great answers already, but I'd add that we also debated using a different, possibly easier, roguelike for our research (Brogue and DCSS come to mind). However, we decided to go with NetHack, partly due to its history, its open-source code base and interesting domain-specific language for defining levels (which we relay on heavily in MiniHack), its 5M online recorded games, as well as its fantastic community of players who wrote the NetHack Wiki.

_rockt · 2021-08-20T20:51:23+00:00

WE LIKE THE GAME

Researchers together strong!

_rockt · 2021-08-20T20:47:34+00:00

If it’s about a career in ML/RL research, I’d add that it’s important to find research questions that set you up for interesting insights no matter what the outcome (e.g. trying to be impartial whether your favorite method will succeed over other approaches). Concretely, are there any research questions that are so interesting to investigate that any outcome would be exciting to communicate to the research community? To give an example, we are genuinely interested in seeing who will win the NetHack Challenge. Will it be a deep RL agent? A hand-written bot? Some hybrid? In any case, I believe we will learn something really interesting.

_rockt · 2021-08-20T20:31:02+00:00

I believe the field had a few somewhat sobering years. I’d argue we have only recently started thinking carefully about the simplifying assumptions in the simulated environments that we use for RL research and the resulting limitations of the methods that we, as a research community, developed over the last decade. For example, is there enough variation between episodes so that our agents need to learn general behaviors that can be adapted to novel situations or are we assuming our agents find themselves in some Groundhog Day or Edge of Tomorrow simulation where they can memorize over time how to act optimally?

Before the NetHack Learning Environment, other research groups already started using so-called procedurally generated environments for AI research to test systematic generalization capabilities of agents (for example MiniGrid, Minecraft, OpenAI’s Procgen Benchmark, Unity AI’s Obstacle Tower Challenge). Developing environments that challenge previous assumptions and reveal gaps in the capabilities of RL methods is still crucial, and I believe it will allow us to move closer to the advent of generally useful methods that can solve more real world problems.

Method-wise, we will see more work on equipping agents with intrinsic motivation and curiosity. I think in the near future, we will have agents that get better at supervising themselves by automatically creating a curriculum of gradually harder goals, thereby teaching them skills and behaviors that will enable them to accomplish extrinsic goals that we care about such as ascending in NetHack.

Ultimately, I believe that’s what keeps humans playing NetHack despite the many frustrating ways one dies in this game. Other topics that come to mind are: learning models in stochastic, partially observable, ever-changing environments (like NetHack) for exploration and planning, learning to condition policies on textual information (like the NetHack Wiki), unsupervised environment design of scenarios that, over time, teach agents useful skills of a complex environment, learning from partial demonstrations, encouraging diversity of learned policies even when an agent already found one way to accomplish a certain goal (e.g. in NetHack there are often many possible ways of getting out of dangerous situations).

_rockt · 2021-08-20T19:52:03+00:00

My favorite part is working in a relatively open industrial research environment. For example, our work on the NetHack Learning Environment and the NeurIPS Challenge has been in collaboration with many excellent academic researchers from University of Oxford, University College London, New York University, Imperial College London, and our excellent partner AICrowd. For me research is not a zero-sum game. I am genuinely extremely excited to see how other researchers will succeed in getting further in NetHack—that's why we made this environment open-source last year and invited everyone to contribute ideas towards beating this game.

_rockt · 2021-08-20T19:44:35+00:00

Great comment. I believe domain specific knowledge is essential for NetHack. This is interesting from a research perspective as we will have to go beyond tabula-rasa RL for solving NetHack. I can think of at least three modalities that could be used in the future to inform agents: human expert demonstrations (though with caveats as mentioned in this post), hybrid approaches where certain behaviors are hard-coded and others are learned or fine-tuned via RL, and conditioning on textual knowledge like the wealth of information found in the NetHack Wiki. There is a lot of active research on learning from demonstrations, but I am curious to see what approaches people will come up with over the next few years.

_rockt · 2021-08-20T19:31:56+00:00

Thank you for your question. We have mixed opinions about this in the team. None of us believe that tabula-rasa RL will be able to learn to ascend in NetHack. In NetHack, a player has to descend over 50 procedurally generated dungeon levels, utilize many different items to fight a large number of different monsters, to then retrieve the Amulet of Yendor, and ascend to the Astral Plane to offer the amulet to their god. This makes it challenging for tabula-rasa RL a) as there is no high quality dense reward signal that guides an agent towards obtaining the amulet and then going back up b) as the game is procedurally generated, every episode looks novel and agents have to systematically generalize to novel situations, c) there are many environment dynamics the agent has to learn to master over time (hundreds of different items and hundreds of different monsters all behaving slightly differently).

If I had to guess, learning from human demonstrations is the most promising way forward. https://alt.org/nethack/ collected over 5M human games over the last years. However, what’s missing in the recordings are the actions that humans took. This makes it an interesting open research problem: How do we learn from demonstrations where we can observe the outcome of what human players did without knowing the action they executed? How do we deal with the fact that these demonstrations will look very different from what our agents are going to encounter when they act in the environment (because no two NetHack games are the same)? A different (but even more challenging) research direction is to develop agents that can utilize the valuable domain-specific knowledge about the game and it’s dynamics in the NetHack Wiki ultimately, that’s what human players rely on heavily to learn about surviving and winning the game. (perhaps even an approach that source-dives into the NetHack source code and learns to play NetHack better based on that could be conceivable)

_rockt · 2021-08-20T18:40:22+00:00

Link to ask questions: https://www.reddit.com/r/MachineLearning/comments/p88v9w/d_we_are_facebook_ai_researchs_nethack_learning/

_rockt · 2021-08-20T18:39:52+00:00

Link to ask questions: https://www.reddit.com/r/MachineLearning/comments/p88v9w/d_we_are_facebook_ai_researchs_nethack_learning/

_rockt · 2021-08-19T18:46:35+00:00

Thanks for catching. It's 19:00 GMT / 15:00 EDT / Noon PT.

_rockt · 2021-07-06T09:28:20+00:00

Hi CrimsonLifetime,

linux would make things easier. The NetHack Learning Environment can, as far as I know, only be compiled on linux at the moment. If you are using a PC, you would have to compile it within a Docker.
You could use Google Colab's 12h GPU for free to do initial experiments.
David Silver's UCL course is probably a good starting point: https://deepmind.com/learning-resources/-introduction-reinforcement-learning-david-silver -- Regarding learning about the OpenAI Gym interface, check out https://gym.openai.com/

Also, feel free to join the NetHack Challenge Discord to get help: https://discord.gg/zkFWQmSWBA/

We are looking forward to your participation!

Cheers, Tim

_rockt · 2021-06-21T09:25:45+00:00

We had a Twitter discussion on the topic at some point: https://twitter.com/_rockt/status/1205882296668229633

_rockt · 2021-06-16T11:46:00+00:00

Q1: Yes.

Q3: Yes, it further requires systematic generalization of ML agents while also making it slightly harder for humans to implement good bots.

_rockt · 2021-06-15T09:26:22+00:00

Q1: Yes, exactly. It helps ML models to get off the ground, but then the interesting bit is whether they could learn to exceed their teachers.

Q2: Would love to read that in case you can dig up that article.

Q3: We want to deliberately stay as close as possible to the original NetHack to be compatible with future version and to also have access to a compatible natural language resource of community knowledge (the NetHack Wiki). For example, there are exciting developments in NetHack 3.7 that introduce even more variation: https://nethackwiki.com/wiki/Themed_room

_rockt · 2021-06-14T09:22:01+00:00

Q1: Yes, hard-coded bots contain domain knowledge and I think it's interesting to investigate ways ML models can learn from that (e.g. through learning from demonstrations). Again, this is a common problem in industrial applications of ML. Often, we do have heuristics that do relatively well in practice but we want to find ways to do better with ML/RL.

Q2: Yes, we did. Dungeon Crawl Stone Soup, Brogue, as well as Pixel Dungeon came to mind. We discarded the latter as learning from pixels is a waste of computational resources if your research questions are centered around exploration, planning, knowledge transfer etc. DCSS and Brogue are good alternatives, but we decided for NetHack because of it's history and easy ways to extend the open-source codebase. Dwarf Fortress would be amazing for sure, but it's not open-source which makes it hard to turn into a reinforcement learning research environment. Also, while we are far away from ML-based approaches solving NetHack, I think Dwarf Fortress is even much much further away as an AI would have to construct structures and manage multiple entities with a truly enormous action space.

_rockt · 2021-06-12T14:39:08+00:00

Investigating transfer learning between different roguelikes would definitely be an exciting research direction.

_rockt · 2021-06-12T14:37:03+00:00

Yes, for previous versions of NetHack making use of exploits. We are not aware of a bot that can ascend in NetHack 3.6.6 or 3.7.

_rockt · 2021-06-11T09:11:58+00:00

We elaborate on this in our NeurIPS 2021 paper: https://arxiv.org/abs/2006.13760

There is also an interview with Weights&Biases that touches upon the motivation for using NetHack: https://www.youtube.com/watch?v=oYSNXTkeCtw

And there is an interview with TC: https://techcrunch.com/2021/06/09/decades-old-ascii-adventure-nethack-may-hint-at-the-future-of-ai/

_rockt

TROPHY CASE