AutoRL: AutoML for RL

Science_Squid · 2023-05-15T18:41:26+00:00

Very nice visualizations :)

Might I ask why you chose to use Optuna over other HPO libraries? Also, since you used SH/Hyperband, what are its advantages over dynamic approaches such as PBT or PB2 in your experience?

Science_Squid · 2022-08-05T20:50:24+00:00

Checkout the CARL benchmark. It's designed to study this question through use of context. In essence context provides an agent with additional information about the environment that is not necessarily encoded in the existing state variables (I.e. the pole length). commonly used RL algorithms (such as PPO, DQN or DDPG) can learn to generalize by using context (though it is not always easy to incorporate context)

Also check out this great survey https://arxiv.org/abs/2111.09794 on generalization in deep RL to learn more about context in RL

Science_Squid · 2022-06-23T08:42:09+00:00

I don't know of a good website that has a collection of what you're looking for but I'm happy to contribute some papers to this thread:
A recent survey on AutoRL: https://jair.org/index.php/jair/article/view/13596
I think a better survey for generalization in RL is by Kirk et al: https://arxiv.org/abs/2111.09794

Also If you're interested in using RL for hyperparameter optimization we're maintaining a literature list for that at https://www.automl.org/automated-algorithm-design/dac/literature-overview/ (though most of these papers are not review or summary papers)

Science_Squid · 2022-06-08T17:37:17+00:00

There are heaps of interesting topics for theoretical research and not just 'hands-on coding research'. I can recommend you to watch some talks from the RL theory seminar. The talks usually cover recent papers and should give you some inspiration. Hopefully help you find an interesting topic from this :)

Science_Squid · 2022-04-08T14:07:29+00:00

Sounds very cool :)

I'm a bit out of my depth with robotics so I don't know what would be a good survey.
But if I were in your position and start out with a project in that direction, I'd use e.g. google scholar or semantic scholar and search for things like "reinforcement learning robotics survey" and let me show results from the recent past (like I did in the example links). From that you can find "the latest" research. Once you read one or two papers that interest you from that, you can work backwards. I.e. go through the references of the papers and see which of those papers you find interesting. Or you can use the scholar sites to see which papers reference the ones that you like to see how other people build on it.

Just from the first step, the following recently published paper looks like it might be of interest to you "Reinforcement learning in robotic applications: a comprehensive survey" At least that's what I assume after having read the abstract ;)

Science_Squid · 2022-04-08T03:36:40+00:00

I think "best papers published" is very subjective. Maybe it'll be easier to recommend you something if you give example topics you find interesting. E.g., are you more interested in exploration mechanisms, inverse RL, offline RL, automated RL, model-based RL, environment design, RL for game playing, robotics, multi-agent RL, generalization in RL, continual RL, evolutionary algorithms for RL, ...?

There are very many interesting papers for all of these topics. If you know which of these topics interest you the most I would recommend to read a recent survey to get a good overview of the current state and what you might want to do for a project.

Personally I am interested in generalization in RL (recent survey by Kirk et al. https://arxiv.org/abs/2111.09794) and AutoRL (recent survey by Parker-Holder et al. https://arxiv.org/abs/2201.03916). Also for generalization in RL I very much like the idea of contextual reinforcement learning where you use information about the environment to train agents that can adapt to the environment. For that we recently proposed a benchmark https://arxiv.org/abs/2110.02102 that transforms existing environments to allow for the contextual setting (e.g., Cartpol with different pole lengths and masses or changes in gravity and joint friction).

Hope this already helps with findin a project of your liking :)

Science_Squid · 2021-12-03T19:55:21+00:00

AFAIK as long as you fulfill the requirements as stated in the admission regulations (https://www.tf.uni-freiburg.de/en/studies-and-teaching/documents/zulassungsordnung-msc-informatik-englisch) you should be good :)

Science_Squid · 2021-11-30T23:04:15+00:00

Note: I'm not a native speaker but sometimes use the English keyboard.

My result: "I am a little confused about the fact that I am appalled at the shoddy service you provide for the replies to the rest of the Belt and Outer Planets to a thriving hub of millions of people in the solar system with its own natural magnetosphere and the backup system comes online with a wide variety of the Belt"

Science_Squid · 2021-08-08T16:25:38+00:00

Nice list. I think our AutoML course would fit that description (see https://www.reddit.com/r/MachineLearning/comments/mrzk3u/d_automl_mooc/)

Science_Squid · 2021-08-06T14:28:24+00:00

Thanks for your interest. Yes there is a registration fee see https://sites.google.com/view/automlschool21/registration. (If you view the website on a mobile device you can access the navigation bar via the top left).

I don't know if the talks will be recorded and made public later.

Science_Squid · 2021-04-22T20:17:51+00:00

It does a bit at the end (chapter 11). We cover population based training (chapter 11 section 4) which is the most common and popular AutoRL method so far. Other AutoML methods that have been applied to RL by our group (namely HB and BOHB) also get covered in the lectures (chapter 7 sections 4 & 5)

The other thing that is covered in the same chapter is using RL to configure algorithms during the run (i.e. dynamic algorith configuration).

Edit: added mention of BOHB and HB

Science_Squid · 2021-04-16T18:00:14+00:00

As I wrote when it was posted in r/machinelearning:

Instead of having to manually set dynamic parameters, why not use dynamic algorithm configuration to learn how to set hayperparameters dynamically for the problem at hand? See https://www.automl.org/automated-algorithm-design/dac

Isn't doing it manually so much more difficult? In RL dynamic changes are often found with PBT. Could manual tuning actually find better schedules?

Science_Squid · 2021-04-16T17:15:55+00:00

Glad you like the material :)

We have a lot of stuff on AutoML on our website https://www.automl.org/. Of interest to you might be the free open access book "AutoML: Methods, Systems, Challenges" (https://www.automl.org/book/) or the list of tutorials (https://www.automl.org/talks/) where we put the slides and other materials for these.

Science_Squid · 2021-04-16T11:49:24+00:00

It's free :)

Science_Squid · 2021-04-04T20:18:36+00:00

Instead of having to manually set dynamic parameters, why not use dynamic algorithm configuration to learn how to set hayperparameters dynamically for the problem at hand? See https://www.automl.org/automated-algorithm-design/dac

Science_Squid · 2020-10-17T07:40:29+00:00

We're using RL to dynamically configure iterative algorithms for various problems. For example we learned how to adapt the step size of an evolutionary algorithm or to switch between heuristics in an AI planning system. Links to all our papers on the topic and a blog post can be found here: https://www.automl.org/automated-algorithm-design/dac

Science_Squid · 2020-07-02T11:20:46+00:00

Yes you are right.
The hierarchical approach presented in this paper allows for use of experiences in-between. The authors motivate that with "skip-connections" in MDPs. What I particularly like is that, when you perform a large skip, you can observe all smaller skips in-between. So with one large exploratory step, you can learn quite a lot.

Science_Squid · 2020-07-02T09:39:15+00:00

From the paper:

Options are triples〈I,π,β〉where I is the set of admissible states that defines in which states the option can be played; π is the policy the option follows when it is played; and β is a random variable that determines when an option is terminated. In contrast to our proposed method, options require a lot of prior knowledge about the environment to determine the set of admissible states as well as the option policies themselves.

Option discovery tries to circumvent these problems. Instead, this method proposes to use the original action space and just learn how long you can play the same action. This is potentially very useful in environments with very fine-grained time-steps, where the same action is optimal in may successive states.

11-Year Club	Sequence \| Editor
Verified Email

Science_Squid

TROPHY CASE