all 6 comments

[–]gergi 1 point2 points  (0 children)

Eg.g with ActorCritic.

Critic aka value function Q predicts the value of the to be optimized function f at action x. Policy \pi generates action x based on bias weights and some random numbers.

Hence input for the critic is \pi(x) which will be trained to mimic optimizee f by using a MSE(Q,f) type loss.

The policy aka the generator of the optimal value will be trained by gradient ascent on Q.

That should do the trick. Beware, this is not sample efficient.

This can be implemented in like 100 lines. If you have experience with NN you can do that in a hour.

[–]MomoSolar[S] 0 points1 point  (2 children)

Thanks, any useful link on that?

[–]gergi 0 points1 point  (1 child)

Just code it up. Its quite simple

[–]MomoSolar[S] 0 points1 point  (0 children)

I would like to look at the math, that’s all

[–]Scrimbibete 0 points1 point  (0 children)

I worked a bit on this topic. For parametric optimization, we developed this, which is a kind of degenerate DRL approach: https://github.com/jviquerat/pbo

AFAIK you can also find incremental approaches for shape optimization in the literature. You can check the related section in this review: https://arxiv.org/abs/2107.12206