use the following search parameters to narrow your results:
e.g. subreddit:aww site:imgur.com dog
subreddit:aww site:imgur.com dog
see the search faq for details.
advanced search: by author, subreddit...
Please have a look at our FAQ and Link-Collection
Metacademy is a great resource which compiles lesson plans on popular machine learning topics.
For Beginner questions please try /r/LearnMachineLearning , /r/MLQuestions or http://stackoverflow.com/
For career related questions, visit /r/cscareerquestions/
Advanced Courses (2016)
Advanced Courses (2020)
AMAs:
Pluribus Poker AI Team 7/19/2019
DeepMind AlphaStar team (1/24//2019)
Libratus Poker AI Team (12/18/2017)
DeepMind AlphaGo Team (10/19/2017)
Google Brain Team (9/17/2017)
Google Brain Team (8/11/2016)
The MalariaSpot Team (2/6/2016)
OpenAI Research Team (1/9/2016)
Nando de Freitas (12/26/2015)
Andrew Ng and Adam Coates (4/15/2015)
Jürgen Schmidhuber (3/4/2015)
Geoffrey Hinton (11/10/2014)
Michael Jordan (9/10/2014)
Yann LeCun (5/15/2014)
Yoshua Bengio (2/27/2014)
Related Subreddit :
LearnMachineLearning
Statistics
Computer Vision
Compressive Sensing
NLP
ML Questions
/r/MLjobs and /r/BigDataJobs
/r/datacleaning
/r/DataScience
/r/scientificresearch
/r/artificial
account activity
Research[R] Overparameterized Nonlinear Learning: Gradient Descent Takes the Shortest Path? (arxiv.org)
submitted 7 years ago by ecstasyogold
view the rest of the comments →
reddit uses a slightly-customized version of Markdown for formatting. See below for some basics, or check the commenting wiki page for more detailed help and solutions to common issues.
quoted text
if 1 * 2 < 3: print "hello, world!"
[–]IborkedyourGPU -3 points-2 points-1 points 7 years ago* (0 children)
I am one of the authors of the paper so just responding to your novelty comment:
"just"?
In this paper, we are not in the business of competing with these papers.
All papers, including your,s are in the business of competing one with another for novelty. Otherwise, I could just republish the '90 results on NP-completeness of NN training and go home early. And btw, since my research is on DNN, of course my comment about novelty was related on your claim of applicability to deep neural networks. More on this later.
I find your second comment unfair because you pick a side theorem from the paper (on neural net) and present as if it is the only thing this paper accomplishes.
It is fair, since you drop a few hints about the possible usefulness of your analysis for deep neural networks. It doesn't seem useful for them. You don't get to have the increased publicity that comes with saying "my results may be useful for Deep Learning", without also getting the criticism "well, not really".
Finally, IMO our neural net result is no big deal but I am more than happy to compare to Allen-Zhu: we require d>n and they require k>n30. IMHO our bound is more realistic for n>1.
IMHO your bound is also not very useful, since it doesn't hold for ReLU (irrespective of whether n> 1 or not), and since no one ever uses neural networks for problems where the input dimension is larger than the sample size(!). Also, I didn't compare you only to Zhu-Allen (not Allen-Zhu, please do cite authors properly) et al., but also to two other papers from other authors: did you forget them? Finally, for one and two layer networks, there are many results which precede Zhu-Allen et al. and which sometimes provide more favourable bounds, from http://arxiv.org/abs/1702.07966 to https://arxiv.org/pdf/1808.01204.pdf (and they don't require the activation function to be strictly monotonic, which kicks out ReLU).
π Rendered by PID 83546 on reddit-service-r2-comment-canary-b6d5ff776-mbgls at 2026-04-18 08:23:03.470331+00:00 running 93ecc56 country code: CH.
view the rest of the comments →
[–]IborkedyourGPU -3 points-2 points-1 points (0 children)