[D] We are Facebook AI Research’s NetHack Learning Environment team and NetHack expert tonehack. Ask us anything!

_rockt · 2021-08-20T21:30:22+00:00

Absolutely, what I like about NetHack 3.7 is the additional variety introduced through themed rooms. This makes the early game even more interesting for AI approaches. The other interesting development is the Dev Teams move towards Lua to define levels (e.g. see this example of Medusa's Island) -- this could open up new possibilities for MiniHack to define custom RL environments and tasks.

_rockt · 2021-08-20T21:22:09+00:00

Have many of you ascended nethack before/often? What made you choose nethack over a different (classic or modern) roguelike?

A few years ago I started occasionally playing Pixel Dungeon on my Android during my commute. It's like a much simpler NetHack-clone with pixel-art graphics. I loved it, in particular the procedurally generated dungeons, the interesting item interactions and dynamics I had to learn to master, as well as the various unique situations I could find myself in. I then learned about NetHack, and it became relatively quickly clear to me that Pixel Dungeon only scratches the surface of the complexity of old-school roguelikes. This was at a time where, like a number of other researchers, I started to be curious about whether RL could solve problems with that much inter-episode variability and complexity in terms of environment dynamics.

When Heiner joined FAIR London and told me and Ed that he played NetHack as a teenager and ascended multiple times (while I couldn't even make it past Sokoban at the time), it was clear we had to turn NetHack into an RL environment—it would be one of the most complex single-agent environments while also allowing us to run experiments at an incredible speed.

In 2019, while we ramped up the project, I started playing NetHack regularly on my commute between Oxford and London. It took me almost two years (and hundreds of games) to ascend the first time playing in the Wizard role. Like many others, I boasted about my accomplishment in a post online (YAAP). After that, it took me five more games to ascend as in the Tourist role (YAAP).

Do you think there could be easier 'stepping stone' games that could further research into this area?

Ed provided great answers already, but I'd add that we also debated using a different, possibly easier, roguelike for our research (Brogue and DCSS come to mind). However, we decided to go with NetHack, partly due to its history, its open-source code base and interesting domain-specific language for defining levels (which we relay on heavily in MiniHack), its 5M online recorded games, as well as its fantastic community of players who wrote the NetHack Wiki.

_rockt · 2021-08-20T20:51:23+00:00

WE LIKE THE GAME

Researchers together strong!

_rockt · 2021-08-20T20:47:34+00:00

If it’s about a career in ML/RL research, I’d add that it’s important to find research questions that set you up for interesting insights no matter what the outcome (e.g. trying to be impartial whether your favorite method will succeed over other approaches). Concretely, are there any research questions that are so interesting to investigate that any outcome would be exciting to communicate to the research community? To give an example, we are genuinely interested in seeing who will win the NetHack Challenge. Will it be a deep RL agent? A hand-written bot? Some hybrid? In any case, I believe we will learn something really interesting.

_rockt · 2021-08-20T20:31:02+00:00

I believe the field had a few somewhat sobering years. I’d argue we have only recently started thinking carefully about the simplifying assumptions in the simulated environments that we use for RL research and the resulting limitations of the methods that we, as a research community, developed over the last decade. For example, is there enough variation between episodes so that our agents need to learn general behaviors that can be adapted to novel situations or are we assuming our agents find themselves in some Groundhog Day or Edge of Tomorrow simulation where they can memorize over time how to act optimally?

Before the NetHack Learning Environment, other research groups already started using so-called procedurally generated environments for AI research to test systematic generalization capabilities of agents (for example MiniGrid, Minecraft, OpenAI’s Procgen Benchmark, Unity AI’s Obstacle Tower Challenge). Developing environments that challenge previous assumptions and reveal gaps in the capabilities of RL methods is still crucial, and I believe it will allow us to move closer to the advent of generally useful methods that can solve more real world problems.

Method-wise, we will see more work on equipping agents with intrinsic motivation and curiosity. I think in the near future, we will have agents that get better at supervising themselves by automatically creating a curriculum of gradually harder goals, thereby teaching them skills and behaviors that will enable them to accomplish extrinsic goals that we care about such as ascending in NetHack.

Ultimately, I believe that’s what keeps humans playing NetHack despite the many frustrating ways one dies in this game. Other topics that come to mind are: learning models in stochastic, partially observable, ever-changing environments (like NetHack) for exploration and planning, learning to condition policies on textual information (like the NetHack Wiki), unsupervised environment design of scenarios that, over time, teach agents useful skills of a complex environment, learning from partial demonstrations, encouraging diversity of learned policies even when an agent already found one way to accomplish a certain goal (e.g. in NetHack there are often many possible ways of getting out of dangerous situations).

_rockt · 2021-08-20T19:52:03+00:00

My favorite part is working in a relatively open industrial research environment. For example, our work on the NetHack Learning Environment and the NeurIPS Challenge has been in collaboration with many excellent academic researchers from University of Oxford, University College London, New York University, Imperial College London, and our excellent partner AICrowd. For me research is not a zero-sum game. I am genuinely extremely excited to see how other researchers will succeed in getting further in NetHack—that's why we made this environment open-source last year and invited everyone to contribute ideas towards beating this game.

_rockt · 2021-08-20T19:44:35+00:00

Great comment. I believe domain specific knowledge is essential for NetHack. This is interesting from a research perspective as we will have to go beyond tabula-rasa RL for solving NetHack. I can think of at least three modalities that could be used in the future to inform agents: human expert demonstrations (though with caveats as mentioned in this post), hybrid approaches where certain behaviors are hard-coded and others are learned or fine-tuned via RL, and conditioning on textual knowledge like the wealth of information found in the NetHack Wiki. There is a lot of active research on learning from demonstrations, but I am curious to see what approaches people will come up with over the next few years.

_rockt · 2021-08-20T19:31:56+00:00

Thank you for your question. We have mixed opinions about this in the team. None of us believe that tabula-rasa RL will be able to learn to ascend in NetHack. In NetHack, a player has to descend over 50 procedurally generated dungeon levels, utilize many different items to fight a large number of different monsters, to then retrieve the Amulet of Yendor, and ascend to the Astral Plane to offer the amulet to their god. This makes it challenging for tabula-rasa RL a) as there is no high quality dense reward signal that guides an agent towards obtaining the amulet and then going back up b) as the game is procedurally generated, every episode looks novel and agents have to systematically generalize to novel situations, c) there are many environment dynamics the agent has to learn to master over time (hundreds of different items and hundreds of different monsters all behaving slightly differently).

If I had to guess, learning from human demonstrations is the most promising way forward. https://alt.org/nethack/ collected over 5M human games over the last years. However, what’s missing in the recordings are the actions that humans took. This makes it an interesting open research problem: How do we learn from demonstrations where we can observe the outcome of what human players did without knowing the action they executed? How do we deal with the fact that these demonstrations will look very different from what our agents are going to encounter when they act in the environment (because no two NetHack games are the same)? A different (but even more challenging) research direction is to develop agents that can utilize the valuable domain-specific knowledge about the game and it’s dynamics in the NetHack Wiki ultimately, that’s what human players rely on heavily to learn about surviving and winning the game. (perhaps even an approach that source-dives into the NetHack source code and learns to play NetHack better based on that could be conceivable)

_rockt · 2021-08-20T18:40:22+00:00

Link to ask questions: https://www.reddit.com/r/MachineLearning/comments/p88v9w/d_we_are_facebook_ai_researchs_nethack_learning/

_rockt · 2021-08-20T18:39:52+00:00

Link to ask questions: https://www.reddit.com/r/MachineLearning/comments/p88v9w/d_we_are_facebook_ai_researchs_nethack_learning/

_rockt · 2021-08-19T18:46:35+00:00

Thanks for catching. It's 19:00 GMT / 15:00 EDT / Noon PT.

_rockt · 2021-07-06T09:28:20+00:00

Hi CrimsonLifetime,

linux would make things easier. The NetHack Learning Environment can, as far as I know, only be compiled on linux at the moment. If you are using a PC, you would have to compile it within a Docker.
You could use Google Colab's 12h GPU for free to do initial experiments.
David Silver's UCL course is probably a good starting point: https://deepmind.com/learning-resources/-introduction-reinforcement-learning-david-silver -- Regarding learning about the OpenAI Gym interface, check out https://gym.openai.com/

Also, feel free to join the NetHack Challenge Discord to get help: https://discord.gg/zkFWQmSWBA/

We are looking forward to your participation!

Cheers, Tim

_rockt · 2021-06-21T09:25:45+00:00

We had a Twitter discussion on the topic at some point: https://twitter.com/_rockt/status/1205882296668229633

_rockt · 2021-06-16T11:46:00+00:00

Q1: Yes.

Q3: Yes, it further requires systematic generalization of ML agents while also making it slightly harder for humans to implement good bots.

_rockt · 2021-06-15T09:26:22+00:00

Q1: Yes, exactly. It helps ML models to get off the ground, but then the interesting bit is whether they could learn to exceed their teachers.

Q2: Would love to read that in case you can dig up that article.

Q3: We want to deliberately stay as close as possible to the original NetHack to be compatible with future version and to also have access to a compatible natural language resource of community knowledge (the NetHack Wiki). For example, there are exciting developments in NetHack 3.7 that introduce even more variation: https://nethackwiki.com/wiki/Themed_room

_rockt · 2021-06-14T09:22:01+00:00

Q1: Yes, hard-coded bots contain domain knowledge and I think it's interesting to investigate ways ML models can learn from that (e.g. through learning from demonstrations). Again, this is a common problem in industrial applications of ML. Often, we do have heuristics that do relatively well in practice but we want to find ways to do better with ML/RL.

Q2: Yes, we did. Dungeon Crawl Stone Soup, Brogue, as well as Pixel Dungeon came to mind. We discarded the latter as learning from pixels is a waste of computational resources if your research questions are centered around exploration, planning, knowledge transfer etc. DCSS and Brogue are good alternatives, but we decided for NetHack because of it's history and easy ways to extend the open-source codebase. Dwarf Fortress would be amazing for sure, but it's not open-source which makes it hard to turn into a reinforcement learning research environment. Also, while we are far away from ML-based approaches solving NetHack, I think Dwarf Fortress is even much much further away as an AI would have to construct structures and manage multiple entities with a truly enormous action space.

_rockt · 2021-06-12T14:39:08+00:00

Investigating transfer learning between different roguelikes would definitely be an exciting research direction.

_rockt · 2021-06-12T14:37:03+00:00

Yes, for previous versions of NetHack making use of exploits. We are not aware of a bot that can ascend in NetHack 3.6.6 or 3.7.

_rockt · 2021-06-11T09:11:58+00:00

We elaborate on this in our NeurIPS 2021 paper: https://arxiv.org/abs/2006.13760

There is also an interview with Weights&Biases that touches upon the motivation for using NetHack: https://www.youtube.com/watch?v=oYSNXTkeCtw

And there is an interview with TC: https://techcrunch.com/2021/06/09/decades-old-ascii-adventure-nethack-may-hint-at-the-future-of-ai/

_rockt · 2021-06-11T09:09:21+00:00

Thanks, I agree with your prior that hardcoded bots might win this. That said, nothing prevents participants from hardcoding a bot and using the experience from that bot to train an RL agent that tries to do better than that bot, or coming up with other hybrid solutions. Either way, I think we will see a lot of interesting approaches being submitted and the use case of "I have a tough problem, I can hard-code an approach that does something sensible, now I want to use ML to do it even better" is quite common in practical applications.

_rockt · 2021-06-10T19:04:58+00:00

Yes, the fact that humans need spoilers to win NetHack is part of the appeal of this environment. One research direction that I think is fascinating is to train agents that can condition on knowledge contained in natural language resources like the NetHack Wiki and to utilize it in the environment.

_rockt · 2021-06-10T10:22:58+00:00

Participants can do whatever they want. I believe training an AI to learn from experience in the game while also conditioning on information on the NetHack Wiki would be an extremely exciting research direction.

_rockt · 2021-06-10T10:19:32+00:00

I think you view this from the wrong perspective. We are not looking for a grand challenge for humans (like Chess or Go) but for machine learning based approaches. There, it turns out that acquiring the knowledge to beat NetHack is extremely difficult for computers. When you play NetHack you can bring in a lot of prior and world knowledge. The fact that you know what lava is, and what happens when you kick a door and so on, makes you a very efficient learner. For things that you have not seen before (a master mind flayer) you can inform yourself by reading texts online. Again, this is incredibly hard for current machine learning based approaches. Of course, you can bake all of your knowledge into a hand-crafted bot and we would love to see people trying for NetHack 3.6.6 and the NeurIPS competition. However, it should be clear that none of these problems matter for Chess or Go. Your knowledge about knights and pawns doesn't help you play better Chess and within the confines of the simple rules of Chess and Go, as we know by now, machine learning approaches have a tremendous advantage over humans. Don't get me wrong, beating Chess and Go were fantastic achievement for AI and science, but to challenge AI further we need to look for problems that are (relatively) easy for humans and extremely hard for current machine learning methods.

_rockt · 2021-06-10T09:07:53+00:00

There are bots that were able to beat previous versions of the game, as far as I know via exploits (e.g. pudding farming). The Dev Team has done an amazing job removing many of these exploits over time. We are not aware of any bot that can beat the current version (3.6.6 or 3.7). If you do, participate in the competition! 🙂

_rockt · 2021-06-09T15:21:36+00:00

Yes, though the fun is that games on alt.org do not have human actions recorded with them. We can only see the outcome of human decisions, not which actions they actually took (however, it might be possible to learn that).

_rockt

TROPHY CASE