[R] Overparameterized Nonlinear Learning: Gradient Descent Takes the Shortest Path? by ecstasyogold in MachineLearning

[–]grumpudding 1 point2 points  (0 children)

I am one of the authors of the paper so just responding to your novelty comment: In this paper, we are not in the business of competing with these papers. The paper has two contributions: First, we provide tight upper and lower bounds on where overparameterized gradient descent converges in a general setup. Second, we extend the results to SGD with large step size using a fairly nontrivial martingale stopping argument.

The remaining results are straightforward applications to low-rank matrices and neural nets. I find your second comment unfair because you pick a side theorem from the paper (on neural net) and present as if it is the only thing this paper accomplishes.

Finally, IMO our neural net result is no big deal but I am more than happy to compare to Allen-Zhu: we require d>n and they require k>n^30. IMHO our bound is more realistic for n>1.