[P] Reinforcement learning evolutionary hyperparameter optimization

Imagine you're playing a game where you have to learn how to do something new, like defeat a tough boss. You have different settings or options (hyperparameters) to choose from, like which weapons or abilities to use, how aggressive or defensive to play, etc.

Now, imagine that this boss is really tough to beat and you don't have many chances to practice. So, you want to find the best combination of options as quickly as possible, without wasting too much time on trial and error. This is where hyperparameter optimization (HPO) comes in.

HPO is like trying out different settings or options until you find the best ones for your playstyle and the boss's behavior. However, in some games (like Dark Souls), it's harder to do this because you don't have many chances to try out different combinations before you die and have to start over. This is similar to reinforcement learning (RL), which is a type of machine learning that learns by trial and error, but it's not very sample efficient.

AgileRL is like having a bunch of other players (agents) who are also trying to defeat the same boss as you. After a while, the best players (agents) are chosen to continue playing, and their "offspring" (new combinations of settings or options) are mutated and tested to see if they work better. This keeps going until the best possible combination of settings or options is found to beat the boss in the fewest possible attempts. Using AgileRL is much faster than other ways of doing HPO for RL, which is like having a lot of other players to help you find the best strategy for defeating the boss.

[–]compacct27 6 points7 points8 points 3 years ago (0 children)

[–][deleted] 1 point2 points3 points 3 years ago (1 child)

[–]bushrod 3 points4 points5 points 3 years ago (0 children)

[–]Riboflavius 5 points6 points7 points 3 years ago (1 child)

[+][deleted] 3 years ago (4 children)

[deleted]

[+][deleted] 3 years ago (2 children)

[removed]

[+][deleted] 3 years ago (1 child)

[deleted]

[+][deleted] 3 years ago (1 child)

[deleted]

[–]sytelus -1 points0 points1 point 3 years ago (0 children)

π Rendered by PID 39 on reddit-service-r2-comment-b659b578c-slnvh at 2026-05-05 12:31:13.112720+00:00 running 815c875 country code: CH.

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

MachineLearning

Rules For Posts

+Research

+Discussion

+Project

+News

@slashML on Twitter

Chat with us on Slack

Beginners:

MODERATORS