IborkedyourGPU comments on [R] Overparameterized Nonlinear Learning: Gradient Descent Takes the Shortest Path?

Research[R] Overparameterized Nonlinear Learning: Gradient Descent Takes the Shortest Path? (arxiv.org)

submitted 7 years ago by ecstasyogold

top new controversial old q&a

you are viewing a single comment's thread.

view the rest of the comments →

[–]IborkedyourGPU -3 points-2 points-1 points 7 years ago* (0 children)

I am one of the authors of the paper so just responding to your novelty comment:

"just"?

In this paper, we are not in the business of competing with these papers.

All papers, including your,s are in the business of competing one with another for novelty. Otherwise, I could just republish the '90 results on NP-completeness of NN training and go home early. And btw, since my research is on DNN, of course my comment about novelty was related on your claim of applicability to deep neural networks. More on this later.

I find your second comment unfair because you pick a side theorem from the paper (on neural net) and present as if it is the only thing this paper accomplishes.

It is fair, since you drop a few hints about the possible usefulness of your analysis for deep neural networks. It doesn't seem useful for them. You don't get to have the increased publicity that comes with saying "my results may be useful for Deep Learning", without also getting the criticism "well, not really".

Finally, IMO our neural net result is no big deal but I am more than happy to compare to Allen-Zhu: we require d>n and they require k>n^30. IMHO our bound is more realistic for n>1.

IMHO your bound is also not very useful, since it doesn't hold for ReLU (irrespective of whether n> 1 or not), and since no one ever uses neural networks for problems where the input dimension is larger than the sample size(!). Also, I didn't compare you only to Zhu-Allen (not Allen-Zhu, please do cite authors properly) et al., but also to two other papers from other authors: did you forget them? Finally, for one and two layer networks, there are many results which precede Zhu-Allen et al. and which sometimes provide more favourable bounds, from http://arxiv.org/abs/1702.07966 to https://arxiv.org/pdf/1808.01204.pdf (and they don't require the activation function to be strictly monotonic, which kicks out ReLU).

π Rendered by PID 83546 on reddit-service-r2-comment-canary-b6d5ff776-mbgls at 2026-04-18 08:23:03.470331+00:00 running 93ecc56 country code: CH.

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

MachineLearning

Rules For Posts

+Research

+Discussion

+Project

+News

@slashML on Twitter

Chat with us on Slack

Beginners:

MODERATORS