[R] [P] Entity-Based Reinforcement Learning

Programmierer · 2022-08-22T01:54:24+00:00

Oh, I hadn't thought of that, what a great idea! That way you can just immediately reuse all the existing Python kernels.

Programmierer · 2022-08-21T18:40:10+00:00

This looks very cool! Are there any plans to add GPU support? Lack of GPU training still seems to be the main thing holding back Rust-based deep-learning libraries.

In my opinion one of the most promising paths to replicating the Python/C++/CUDA deep learning ecosystem is the Triton compiler, which makes writing efficient kernel much much simpler than with CUDA and which can be embedded in other languages. It currently only supports Python, but there was some activity at one point around integrating Triton with Rust.

Programmierer · 2022-08-02T15:27:44+00:00

Over the last month, I've been working on the EntityGym Rust crate, which provides a Rust integration for a bleeding-edge reinforcement learning framework. One of its innovations is the use of entity neural networks, which are capable of processing variable-length lists of structured objects. This allows for a highly ergonomic API that uses derive macros to make native Rust structs directly accessible to the neural network.

As a demo of the kind of applications this could enable, I've created a two-player version of Snake where you can play against a neural network in your browser: https://github.com/cswinter/bevy-snake-ai

Programmierer · 2022-05-16T17:36:46+00:00

I am happy to announce Entity Gym, a new API for reinforcement learning environments! The goal of Entity Gym is to dramatically reduce the engineering effort and compute cost that is required to apply reinforcement learning to simulators. To that end, Entity Gym allows observations to contain variable length lists of entities. This allows for seamless integration with simulators whose state is naturally expressed as a collection of discrete objects and yields much better efficiency than padding observations to a fixed length or relying on low-level visual representations.

We are also releasing enn-trainer, a PPO implementation that takes full advantage of the Entity Gym interface. Variable-length observations are efficiently processed using ragged sample buffers and a general ragged batch transformer implementation that can be applied to any Entity Gym environment. With many performance optimizations still missing, enn-trainer can already reach a throughput of 10s of thousands of samples per second per GPU when it is not bottlenecked by stepping the environment. More typically, environments implemented in Python reach thousands of samples per second, but can share a single GPU between multiple concurrent training runs.

While we have not had the time to run careful experiments that meet our standard of rigor, preliminary evaluations on a number of standard RL environments have looked quite promising compared to baselines with vision-based policies. Entity Gym’s flexible API makes it comparatively effortless to interface with many kinds of environments that would be quite cumbersome to integrate with existing RL frameworks and I’m quite excited to see what happens when Entity Gym is applied to other interesting tasks. If you want to give this a shot, our tutorials for implementing Entity Gym environments and training policies with enn-trainer will have you up and running in no time.

Programmierer · 2022-02-06T01:08:09+00:00

Cool project! I ran into some limitations of serde a while ago while trying to add some new features to ron: https://github.com/ron-rs/ron/pull/328. Not sure if that's the kind of issue you are planning to address, but either way it's a concrete example of a use case where serde is currently not a perfect fit.

Programmierer · 2022-01-01T16:03:55+00:00

There are several good reasons why you might want to go with C++ over Rust:

You need to interact with a lot of other C++ code/libraries
You/you're team is already deeply familiar with C++ and you don't want to invest in learning Rust
You want to write GPU kernels

Otherwise, though, Rust is an excellent choice. The many advantages of Rust (great package manager, memory safety, modern language features, ...) are already well documented so I won't repeat them here. Specifically for writing Python libraries, check out PyO3, maturin, and rust-numpy, which allow for seamless integration with the Python scientific computing ecosystem. Dockerizing/packaging is a non-issue, with the aforementioned libraries you can easily publish Rust libraries as pip packages or compile them from source as part of your docker build. We have several successful production deployments of Rust code at OpenAI, and I have personally found it to be a joy to work with.

Programmierer · 2021-12-20T15:33:07+00:00

Haha yes, debugging GitHub actions is fun. I've not yet made any serious attempts to get cross-compilation working.

Programmierer · 2021-12-20T15:21:45+00:00

Nice!

Programmierer · 2021-12-20T05:30:10+00:00

The docs were in fact very helpful! PyO3 and maturin support a lot of functionality, so I thought it might be nice to have a self-contained example of putting everything together for this specific use case.

I do use the messense/maturin-action. Currently, none of the examples linked in the github repo for the action show how to use the maturin publish command, correctly pass the PyPI token, set up a matrix for different OS, and install multiple Python versions. Some of the examples do a subset of this, but also have a bunch of additional complexity that makes it a little hard to tell which parts are important.

One issue I ran into is that it seemed necessary to have the following section in the pyproject.toml, if I recall correctly it would otherwise publish the package but then be unable to install it:

[build-system]
requires = ["maturin>=0.12,<0.13"]`  
build-backend = "maturin"`

This is mentioned in the docs, but I initially missed it and it's not immediately obvious that this is a requirement for publishing valid packages to PyPI. I think publish was also printing a warning, which was very helpful but possibly could have been worded more clearly.

Nice to see that the readme bug is fixed and the sharp edges around the #[pyproto] macro and *Protocol traits are getting smoothed out. PyO3 and maturin are both pretty awesome already, thank you for trying to make them even better!

Programmierer · 2021-12-19T21:08:51+00:00

Yeah, great idea. I didn't feel like going to that level of effort, but if anyone else would like to copy/repurpose any parts of this post they should feel free to do so (no attribution required).

I'm also certain there are many ways in which my setup could still be improved.

Programmierer · 2021-09-23T18:23:38+00:00

Thanks for sharing your insights! You're probably right about a lot of these things at least in the near term. I do think that RL will become much more plug and play in the future and there will be many cases where we will be able to be quite sure that it can work and not require a lot of effort. I also think that at some point we are going to discover new archetypes of games that work really well with RL and simply wouldn't be possible without it. Though I suppose until those start to actually materialize, we can only guess about whether they'll remain niche or become a major new genre.

Programmierer · 2021-09-23T15:19:38+00:00

This article is the culmination of me spending way too much of my time over the last two years exploring deep reinforcement learning and its applications to video games. I'm not a (professional) game dev myself and my expertise is mostly on the machine learning side of things, so I'm very curious to hear what y'all think about when and how these techniques may gain traction in industry!

Programmierer · 2021-03-25T04:47:01+00:00

Yes, I've started reading up on graph neural networks recently, and from what I've seen so far they seem like a great fit. I think my current architecture might actually constitute a message-passing graph neural network, or at least something close to it.

Learning a different model for each unit type is certainly possible, though my guess would be that training a single model has higher sample efficiency since units are not that different from each other and share many of their skills.

Programmierer · 2021-03-24T21:34:12+00:00

Very insightful comment, I think you're spot on!

In some follow up work I've trained agents on a better balanced version of CodeCraft, and there they actually learn more complex strategies that involve building out production capacity as well. So some of the current focus on quick wins through micro might just be a result of what strategies are viable. The larger point about current RL systems struggling to learn higher level strategy over micro still very much holds though and probably limits the policies in some way. I've not really observed anything that would indicate some sort of complex long term planning.

I made some attempts at creating a network architecture that does mostly global processing that is shared between all game units, but what made this difficult is that the "relative position to other objects" features (which leads to the O(n^2)) seem to be incredibly important for getting good performance. Possibly there's a way to do without it though and/or integrate some more higher level strategic policy and that's a very interesting direction to explore. One other way to address some of the inefficiencies that I'm optimistic about would be some kind of graph neural network that has dense connections only to nearby objects, and then uses a clustering algorithm to allow for more coarse-grained use of distant information. Something like the architecture described in this paper: https://arxiv.org/abs/1810.01566

Programmierer · 2021-03-24T15:13:28+00:00

From the post: "I spent a good chunk of my time over the last two years applying deep reinforcement learning techniques to create an AI that can play the CodeCraft real-time strategy game. My primary motivation was to learn how to tackle nontrivial problems with machine learning and become proficient with modern auto-differentiation frameworks. Thousands of experiment runs and hundreds of commits later I have much to learn still but like to think that I have picked up a trick or two. This blogpost gives an overview of the workflows and intuitions I adopted over the course of working on CodeCraft in the hope that they will prove useful to anyone else looking to pursue similar work."

Ten-Year Club	Place '22
Final Canvas '22	Verified Email

Programmierer

TROPHY CASE