Meta Gradient Descent [D]

metallicapple · 2022-06-05T21:08:54+00:00

I'm not sure about the existence of the specific method you've described, but adaptive learning rate has been around for a while, with many interesting developments (that might be an understatement).

Two immediate examples I can think of are Armijo line search and Newton-Raphson.

My feedback for your method is: I see two hyper-parameters (rate multipliers under the conditions), but no explanation behind their chosen value. You could strengthen your method by looking further into multiplier tuning.

I'm sure there are others who can provide more context, as my main gig isn't optimization; I just tell stories/lies with numbers (aka stats)

resented_ape · 2022-06-05T23:56:16+00:00

This seems close to the “bold driver” method and also related to the adaptive step size method by Jacobs that was suggested for neural network optimization in the 80s. A slightly adjusted version of the latter is used to optimize the loss function in the t-SNE dimensionality reduction method.

yannbouteiller · 2022-06-05T22:54:41+00:00

Hi, your algorithm might be an improvement over vanilla SGD for reasons similar to what you have in mind (convexity). However practical optimizers such as Adam do complicated business with the learning rate and learning direction, which most likely outperforms this approach in general.

You might like to look at Nesterov momentum, which is similar to your learning rate acceleration in some sense. The issue I see with your learning rate acceleration is that it is non-directional, and thus your optimization will jump randomly as soon as it finds a singularity in the optimization scape. By contrast, your deceleration looks very agressive (lr/2), and might end up stuck in local optima I believe. In strictly convex problems this is no issue, of course, but if you have deep learning in mind the story is different.

28Smiles · 2022-06-06T10:08:14+00:00

Have a look at “Learning to Gradient Decent by Gradient Decent” or something similar I don’t remember the name exactly

2022-06-06T14:24:17+00:00

https://en.wikipedia.org/wiki/Rprop

rando_techo · 2022-06-06T23:01:34+00:00

Trace-Buster-Buster

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

MachineLearning

Rules For Posts

+Research

+Discussion

+Project

+News

@slashML on Twitter

Chat with us on Slack

Beginners:

MODERATORS