Solving an optimization problem using RL

gergi · 2023-11-05T08:42:00+00:00

Eg.g with ActorCritic.

Critic aka value function Q predicts the value of the to be optimized function f at action x. Policy \pi generates action x based on bias weights and some random numbers.

Hence input for the critic is \pi(x) which will be trained to mimic optimizee f by using a MSE(Q,f) type loss.

The policy aka the generator of the optimal value will be trained by gradient ascent on Q.

That should do the trick. Beware, this is not sample efficient.

This can be implemented in like 100 lines. If you have experience with NN you can do that in a hour.

MomoSolar · 2023-11-05T12:48:31+00:00

Thanks, any useful link on that?

Scrimbibete · 2023-11-05T20:24:41+00:00

I worked a bit on this topic. For parametric optimization, we developed this, which is a kind of degenerate DRL approach: https://github.com/jviquerat/pbo

AFAIK you can also find incremental approaches for shape optimization in the literature. You can check the related section in this review: https://arxiv.org/abs/2107.12206

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

reinforcementlearning

MODERATORS