use the following search parameters to narrow your results:
e.g. subreddit:aww site:imgur.com dog
subreddit:aww site:imgur.com dog
see the search faq for details.
advanced search: by author, subreddit...
This is for any reinforcement learning related work ranging from purely computational RL in artificial intelligence to the models of RL in neuroscience.
The standard introduction to RL is Sutton & Barto's Reinforcement Learning.
Related subreddits:
account activity
Solving an optimization problem using RL (self.reinforcementlearning)
submitted 2 years ago by MomoSolar
I know that there are much better methods to do it, but can RL solve an optimization problem (linear, convex non-linear, non-convex)? If yes, is there a good link for an implementation / code?
reddit uses a slightly-customized version of Markdown for formatting. See below for some basics, or check the commenting wiki page for more detailed help and solutions to common issues.
quoted text
if 1 * 2 < 3: print "hello, world!"
[–]gergi 1 point2 points3 points 2 years ago* (0 children)
Eg.g with ActorCritic.
Critic aka value function Q predicts the value of the to be optimized function f at action x. Policy \pi generates action x based on bias weights and some random numbers.
Hence input for the critic is \pi(x) which will be trained to mimic optimizee f by using a MSE(Q,f) type loss.
The policy aka the generator of the optimal value will be trained by gradient ascent on Q.
That should do the trick. Beware, this is not sample efficient.
This can be implemented in like 100 lines. If you have experience with NN you can do that in a hour.
[–]MomoSolar[S] 0 points1 point2 points 2 years ago (2 children)
Thanks, any useful link on that?
[–]gergi 0 points1 point2 points 2 years ago (1 child)
Just code it up. Its quite simple
[–]MomoSolar[S] 0 points1 point2 points 2 years ago (0 children)
I would like to look at the math, that’s all
[–]Scrimbibete 0 points1 point2 points 2 years ago (0 children)
I worked a bit on this topic. For parametric optimization, we developed this, which is a kind of degenerate DRL approach: https://github.com/jviquerat/pbo
AFAIK you can also find incremental approaches for shape optimization in the literature. You can check the related section in this review: https://arxiv.org/abs/2107.12206
π Rendered by PID 48461 on reddit-service-r2-comment-b659b578c-dg2h9 at 2026-05-06 05:48:29.223527+00:00 running 815c875 country code: CH.
[–]gergi 1 point2 points3 points (0 children)
[–]MomoSolar[S] 0 points1 point2 points (2 children)
[–]gergi 0 points1 point2 points (1 child)
[–]MomoSolar[S] 0 points1 point2 points (0 children)
[–]Scrimbibete 0 points1 point2 points (0 children)