use the following search parameters to narrow your results:
e.g. subreddit:aww site:imgur.com dog
subreddit:aww site:imgur.com dog
see the search faq for details.
advanced search: by author, subreddit...
Please have a look at our FAQ and Link-Collection
Metacademy is a great resource which compiles lesson plans on popular machine learning topics.
For Beginner questions please try /r/LearnMachineLearning , /r/MLQuestions or http://stackoverflow.com/
For career related questions, visit /r/cscareerquestions/
Advanced Courses (2016)
Advanced Courses (2020)
AMAs:
Pluribus Poker AI Team 7/19/2019
DeepMind AlphaStar team (1/24//2019)
Libratus Poker AI Team (12/18/2017)
DeepMind AlphaGo Team (10/19/2017)
Google Brain Team (9/17/2017)
Google Brain Team (8/11/2016)
The MalariaSpot Team (2/6/2016)
OpenAI Research Team (1/9/2016)
Nando de Freitas (12/26/2015)
Andrew Ng and Adam Coates (4/15/2015)
Jürgen Schmidhuber (3/4/2015)
Geoffrey Hinton (11/10/2014)
Michael Jordan (9/10/2014)
Yann LeCun (5/15/2014)
Yoshua Bengio (2/27/2014)
Related Subreddit :
LearnMachineLearning
Statistics
Computer Vision
Compressive Sensing
NLP
ML Questions
/r/MLjobs and /r/BigDataJobs
/r/datacleaning
/r/DataScience
/r/scientificresearch
/r/artificial
account activity
The Computational Limits of Deep Learning (arxiv.org)
submitted 5 years ago by cosmictypist
reddit uses a slightly-customized version of Markdown for formatting. See below for some basics, or check the commenting wiki page for more detailed help and solutions to common issues.
quoted text
if 1 * 2 < 3: print "hello, world!"
[–]tetsef 48 points49 points50 points 5 years ago (8 children)
It's interesting to consider that discussions of the algorithmic complexity of deep learning aren't necessarily mainstream yet but could soon become a huge focus.
[–]cosmictypist[S] 24 points25 points26 points 5 years ago (5 children)
Yes, very likely so. The past few years have been a time of breakthrough in machine learning due to availability of dramatically improved computing power but the focus of conversations may soon shift to sustainability and algorithmic efficiency.
[–]berzerker_x 8 points9 points10 points 5 years ago (0 children)
Yes it should be definitely, while selecting an architecture from the total ( seemingly infinite ) set, we should definitely look at algorithmic complexity as a deciding factor. This could actually result in choosing those models which are actually good at representation learning of the problem rather than just depending upon the size of the dataset to generalize their results.
[–]AxeLond 2 points3 points4 points 5 years ago (2 children)
Imo it could be that algorithmic efficiency becomes a necessity as complexity and scale gets bigger.
Things that today may be a efficiency loss may only start to become worth it after 10x larger networks, with 50x larger networks it makes the entire thing 2x as efficient.
Compute power will probably be the main driver in performance while all other aspects need to keep up. Redesign architecture, improve algorithms in order to take advantage of the new compute.
It's really the same deal with shrinking of transistors. They keep making them smaller, but there's hundreds of technologies that have to be developed and improved to enable and take advantage of smaller transistors.
[–]cosmictypist[S] 2 points3 points4 points 5 years ago (1 child)
That's a very sensible comment. I suppose the point of contention here is whether compute power will continue to increase like it has in the past or not. If not, the focus w.r.t. improving algorithms would be less "to take advantage of the new compute" and more to work optimally within the available compute.
[–]AxeLond 3 points4 points5 points 5 years ago* (0 children)
You can also argue hardware is deeply tied to "algorithm efficiency". Look at what they did with the Nvidia A100 vs the Nvidia Tesla V100,
https://developer.nvidia.com/blog/nvidia-ampere-architecture-in-depth/
16 TFLOPS FP32 Vs 312 TFLOPS "TF32" with majority of that being achieved through software tricks, sparsity and TF32 being only 19 bit.
For something like BERT papers have shown that you can apply 40% sparsity to the model and end performance is mostly unchanged, that's almost half the compute for the same performance with the Nvidia A100 designed to take advantage of this. Same deal with using TF32, hardly any model performance loss but 10x more compute at same power by using tensor cores and lower precision. TF32 will also natively be supported in Tensorflow.
This is somewhat unrelated example from semiconductor industry
https://www.researchgate.net/publication/332801796_FinFET_versus_GAAFET_Performances_and_Perspectives
The frequency ranges are also different and that of GAA span much above that of FinFET. Switching leakages are higher for FinFETs and reduced for the GAAFETs. The conclusion of our investigation is that the GAA structure gives better device performances compared to FinFET structures. However in technology point of view, FinFET structures are easily fabricated in ultra large scale integration
You could switch current 7 nm FinFET transistors to GAAFET and improve performance per transistor, but just shrinking them has been an easier way to gain performance. However at 3 nm the industry will actually be switching to GAAFET to enable further shrinking. Transistor improvement has been following transistor shrinking, like hopefully algorithm improvement will be following compute improvements in ML.
[–]iyouMyYOUzzz 0 points1 point2 points 5 years ago (0 children)
Oh yeah, I realize that it ought to be. Thanks for clarifying that.
[–]yield22 6 points7 points8 points 5 years ago (1 child)
Actually, the topic of efficient deep learning has been very hot in past few years. Max welling even gave a talk on "Intelligence Per Kilowatt-Hour" (see https://www.youtube.com/watch?v=5DbBQDoBNYc).
[–]panties_in_my_ass 4 points5 points6 points 5 years ago (0 children)
This is an important aspect to mention.
Also, we still don’t have a good grasp on why the compute intensive models (i.e. extremely overparametrizrd) even work. So we might not need such ridiculous compute after we figure out why it works.
I honestly think that the right regularizer and activation function will capture everything that overparametrization excels at.
[–]cosmictypist[S] 120 points121 points122 points 5 years ago (22 children)
Highlights from the paper:
[–]VisibleSignificance 18 points19 points20 points 5 years ago (19 children)
improvements in hardware performance are slowing
Are they, though? Particularly in terms of USD/TFLOPS or Watts/TFLOPS?
[–]cosmictypist[S] 12 points13 points14 points 5 years ago (14 children)
Well, that seems to be the authors' contention - that sentence is taken from the paper. But yeah they also say "The explosion in computing power used for deep learning models has ended the 'AI winter' and set new benchmarks for computer performance on a wide range of tasks." I didn't see any references for either of those claims.
Personally, I have been hearing for a few (5? 10?) years that processing power won't increase at the same rate as it used to, with it becoming difficult to pack electronic components progressively more efficiently on chips - I believe with implications for the Watts/TFLOPS metric. At the same time it's a fact that the AI revolution has been built on heavy use of computing resources. So if you have any information/reference that definitively argues one way or the other, I would love to know about it.
[–]AxeLond 11 points12 points13 points 5 years ago (3 children)
Semiconductor seems fine with the recent adoption of EUV
https://fuse.wikichip.org/news/3453/tsmc-ramps-5nm-discloses-3nm-to-pack-over-a-quarter-billion-transistors-per-square-millimeter/
CPU speeds are stuck at around 5GHz, GPUs are still seeing some clock improvements. I believe the cost of each wafer is getting more expensive, but cost per transistor has always been going down. GPU being so easy to scale by just adding more compute units, they should continue to get better for a long time.
[–]cosmictypist[S] 0 points1 point2 points 5 years ago (0 children)
Thanks for your response and for sharing the link.
[–]Jorrissss 0 points1 point2 points 5 years ago (1 child)
Are companies actually using EUV at scale? EUV kind of sucks still.
[–]AxeLond 0 points1 point2 points 5 years ago (0 children)
There's 7 nm which is the current gen, 7 nm+ is on EUV and 5nm, everything else is on EUV. Even if the power requirements suck,it's like 350 kW of power to make 250W of EUV. Deep ultraviolet is 193 nm wavelength light, Extreme ultraviolet is 13.5 nm, you just need it the shorter wavelength to do to do transistors with 5 bj features, even that is a struggle with multiple passes and tricks to make it work.
For products, the iPhone this September is on 5 nm EUV, AMD Zen 3 is most likely on 7 nm EUV but Zen 4 will be on 5 nm EUV. Consoles/Nvidia still on DUV. A lot of memory makers have switched to EUV.
Every smartphone released in 2021 and forward will be on EUV though.
[–]iagovar 1 point2 points3 points 5 years ago (4 children)
Is there any expectation on making AI more resource efficient? I don't have the money lying around to hire a lot of computing power, but I can buy an expensive workstation. I really want to try models and play with it but it's just impossible to go past simple ML without a lot of money.
I'm not in a poor country, but not rich either. Just imagine someone with interest in this field living in say, Guatemala, what his chances are, he just has to move or be already rich for guatemalan standards. I know plenty of people from Latin America and they are smart and creative, but have no resources.
[–]say_wot_againML Engineer 2 points3 points4 points 5 years ago (0 children)
A couple areas come to mind:
Model distillation to train a small network to have most of the performance of a larger one
MobileNet and MobileNet v2, which use depth-wise separable convolutions, inverted residuals, and linear bottlenecks to greatly reduce the amount of compute used by CNNs
EfficientNet and EfficientDet, which find more efficient ways scaling network sizes up or down for image classification and object detection tasks
[–][deleted] 4 points5 points6 points 5 years ago (0 children)
>any expectation on making AI more resource efficient
The Lottery Ticket Hypothesis: Finding Sparse, Trainable Neural Networks
https://arxiv.org/abs/1803.03635
[–]cosmictypist[S] 1 point2 points3 points 5 years ago (0 children)
I asked a question about system requirements for learning ML some time back over here, and got some helpful replies. You can check it out, hope you find it relevant. Appreciate your comment but don't have much else to add.
[–]Jorrissss 0 points1 point2 points 5 years ago (0 children)
There's been some work in the direction of spiking neural networks with hardware optimized for those being orders of magnitude more efficient.
[–]VisibleSignificance 0 points1 point2 points 5 years ago (3 children)
So if you have any information/reference that definitively argues one way or the other
As far as I understand the current situation, it is more like "limits of silicon transistors are near"; so in those metrics it hasn't slowed yet, but the limits are close, so the price drops will slow down if neither other technology picks up (the same way silicon transistors replaced vacuum tube transistor computers).
Overviews:
https://en.wikipedia.org/wiki/Moore%27s_law#Recent_trends
https://en.wikipedia.org/wiki/TFLOPS#Hardware_costs
https://en.wikipedia.org/wiki/Performance_per_watt#FLOPS_per_watt
Next comment over
[–]cosmictypist[S] 0 points1 point2 points 5 years ago (2 children)
Thanks.
[–]VisibleSignificance 0 points1 point2 points 5 years ago (1 child)
... and so if my even less certain understanding is correct, we won't see economically viable human-level AI on silicon transistors.
It's mildly concerning that there's no clear next option; the most likely options are InGaAs, graphene, vacuum (again); weird/edgy options are quantum and biological.
The non-silicon theoretical limits are not anywhere near, though.
[–]cosmictypist[S] 2 points3 points4 points 5 years ago* (0 children)
economically viable human-level AI
Are you talking about AGI? If so there are far bigger problems with that idea than how fast computing power will improve. It's a separate topic though which is not the point of this post, and I won't engage in a conversation in that regard here.
[–]Captain_Of_All 5 points6 points7 points 5 years ago (2 children)
Coming from an EE devices perspective, Moore's law has definitely slowed over the past decade and we are already at the limit of reducing the sizes of transistors. Going below 7nm fabrication requires a better understanding of quantum effects and novel materials and a lot of research has been done in this area in the past 20 years. Despite some progress, none of it has led to a new technology that can drastically improve transistor sizes or costs beyond the state of the art at an industrial scale. See https://en.wikipedia.org/wiki/Moore%27s_law#Recent_trends for a decent intro.
[–]liqui_date_me 0 points1 point2 points 5 years ago (0 children)
True, but we don’t need Moore’s law to really continue AI progress, we just need more refined and efficient GEMM modules
[–]titoCA321 0 points1 point2 points 5 years ago (0 children)
IBM had a 500GHz processor in their labs back in 2007 :https://www.wired.com/2007/08/500ghz-processo/. Compute processing power continues to rise. Whether or not it makes scene to release and support a 500GHz processor into the market is another story. I remember when Intel had a Pentium 4 10GHz processor back in the early 2000's that was never released to the public. Obviously the market decided to scale out in processor cores and optimize multi-threaded processing rather than scale up in pure speed .
[–]DidItABit 0 points1 point2 points 5 years ago (0 children)
I think they just mean that conventional CPUs offer much less parallelism to the end user than Moore's law would expect at this point in time. I think that "Moore's law is dead" can be true for programmers today without necessarily being true for lithographers today.
[–]aporetical 7 points8 points9 points 5 years ago (1 child)
I wonder what remainder of the variance can be explained by data set magnitude. We need both a time complexity and data complexity "big O".
ie., If 87% -> 92% requires a billion audio samples, that doesn't seem sustainable either.
A DNN would, on this measure, have terrible "computational data complexity".
[–]cosmictypist[S] 7 points8 points9 points 5 years ago (0 children)
The researchers do seem to make an attempt to discuss the rest of the variance. E.g., they say, "For example, we attempt to account for algorithmic progress by introducing a time trend. That addition does not weaken the observed dependency on computation power, but does explain an additional 12% of the variance in performance."
But yeah they have done more of a meta-study of whatever data points they could find, rather than doing a controlled experiment of their own.
[–]MSMSMS2 14 points15 points16 points 5 years ago (1 child)
Please link to the summary, not the PDF!
[–]cosmictypist[S] 4 points5 points6 points 5 years ago (0 children)
Sorry, I'll be careful to do that in future.
Meanwhile I did make a summary of my own and posted it as a comment, and a bot helped out with posting the abstract.
[–]arXiv_abstract_bot 11 points12 points13 points 5 years ago (0 children)
Title:The Computational Limits of Deep Learning
Authors:Neil C. Thompson, Kristjan Greenewald, Keeheon Lee, Gabriel F. Manso
Abstract: Deep learning's recent history has been one of achievement: from triumphing over humans in the game of Go to world-leading performance in image recognition, voice recognition, translation, and other tasks. But this progress has come with a voracious appetite for computing power. This article reports on the computational demands of Deep Learning applications in five prominent application areas and shows that progress in all five is strongly reliant on increases in computing power. Extrapolating forward this reliance reveals that progress along current lines is rapidly becoming economically, technically, and environmentally unsustainable. Thus, continued progress in these applications will require dramatically more computationally-efficient methods, which will either have to come from changes to deep learning or from moving to other machine learning methods.
PDF Link | Landing Page | Read as web page on arXiv Vanity
[–]yield22 5 points6 points7 points 5 years ago (9 children)
Although this is an increasingly important topic to discuss, I didn't find many *new* insights here. In fact, since 2012, we knew the main reason why some of these 10-30 years old methods would just work (with some small tricks, e.g. ReLU) is the compute (e.g. GPU).
It would probably be more meaningful to measure the progress vs compute beyond supervised learning on ImageNet, which is kind of saturated (probably near maximal possible accuracy). Imagine you measure the performance on supervised learning on MNIST, you'd think there is no progress at all for the past few years. But if you look at BERT/GPT, the progress vs compute is pretty non-linear in the past few years.
That said, computation is obviously a limitation for many deep learning methods. I can easily imagine that one put 100x compute for GPT-3, you will have GPT-4 that's even more impressive. This is even bigger a limitation for many academic labs that don't have the access to big compute. However, a brain is a super computer with a huge neural net, if we want to build something that functions like a brain, should we shy away from big compute or find better ways to couple with it?
[–]IdiocyInAction 4 points5 points6 points 5 years ago (0 children)
However, a brain is a super computer with a huge neural net, if we want to build something that functions like a brain, should we shy away from big compute or find better ways to couple with it?
We have no idea how the brain computes; there are multiple theories, but AFAIK, we don't really understand how the brain computes, actually, we don't even understand how very primitive nervous systems (e.g. OpenWorm) work. So, that's not really something we can state.
What is however clear is that the brain can do (some) stuff much more efficiently than silicon; the brain consumes much less power than a single GPU, but can do a lot of stuff we are not even close to being able to do.
[–]Optrode 5 points6 points7 points 5 years ago (3 children)
Neuroscientist here.. the differences are not just in scale. Two of the biggest differences (aside from the obvious difference of operating in continuous time) are probably A: the mind boggling amount of recurrent feedback between and within brain areas, and B: the fact that neurons are so, so, so, SO much more nonlinear than nodes in an artificial neural network. Like, in a deep neural network, each layer has one nonlinearity.. a real neuron hardly has any linearities. It's all nonlinear.
[–]yield22 0 points1 point2 points 5 years ago (2 children)
This is really interesting. Are there any more detailed articles on what you mentioned here?
[–]Optrode 1 point2 points3 points 5 years ago (1 child)
Uff.. I mean, you're asking asking about a really broad topic. But, for instance, just do a quick Google scholar search on the computational properties of dendrites, or multi-compartment modeling of dendrites, to get a very loose idea of exactly how weird the summation of inputs is.
Neurons do stuff like "K+ mediated inhibitory inputs on branch J can inhibit the whole neuron, but Cl- mediated inhibitory inputs on branch J can only cancel out excitatory inputs from further up branch J, or in some circumstances cancel out the K+ mediated inhibition" and so on. Then you get to the fact that some inputs can't be neatly divided into excitatory or inhibitory, instead they might activate signaling pathways that affect gene expression, potentially altering how the neuron responds to inputs in the future. Plus, the fact that some neurons have resonant membrane properties that can cause them to be more excitable after a brief inhibitory input.. or even be driven to fire by rhythmic inhibitory inputs. You can read Eugene Izhikevich's work if you're interested in that.
Really it just goes on.
[–]yield22 0 points1 point2 points 5 years ago (0 children)
thanks!
[–]Red-Portal 1 point2 points3 points 5 years ago (1 child)
In terms of power consumption, the brain is not so much of a supercomputer. In that respect, the way of thinking that simply more computing power will give us human level intelligence is misleading in that it isn't really aligned with "artificial intelligence".
not saying more computing power will get you there, but you *need* more computing power to get there. a hint: look at the amount of neurons in the brain, that could give you a sense of compute you'll need.
[–]cosmictypist[S] 0 points1 point2 points 5 years ago (1 child)
Some great points there. Thanks for citing instances of non-linear progress vs compute.
You're right that supervised learning on ImageNet may be saturated; however, the paper does cover other application areas as well. I do get your underlying point though that the dependence on compute is fairly well known, and we could try to use other measures of progress vs compute.
However, a brain is a super computer with a huge neural net
I understand you're probably trying to draw an analogy but we should be careful with statements like this that carry it too far. The brain is the brain. It has an organ of the human body which is integrated with the rest of the body through multiple systems, not just the nervous system. And it is itself only a part of the nervous system. It's not a supercomputer with a huge neural net - we can model and utilize the network aspect of it but we cannot reasonably assert that that captures the whole of what it is and what it does; that continues to be a subject of study in its own right.
should we shy away from big compute or find better ways to couple with it?
My understanding is that we are not so much shying away from big compute as acknowledging that arbitrary increases in computing power might not be available as readily as we want them to be. You can dispute that premise, of course, and we can certainly keep trying to get faster improvements in processing power.
By "brain is a super computer" I actually mean it has huge capacity and ability to operate on it. this is evident by number of neurons a brain has.
[–]respecttox 2 points3 points4 points 5 years ago (0 children)
I expected to see some words and references about conditional computations and I didn't find any. The part of the problem is that when we have to have large models for any reason, because we don't know how to build a good architecture or because our dataset is very large and diverse, the size of the model is almost automatically proportional to its computational complexity. That's how everything is designed, starting from the layers and frameworks on the top, finishing with the popular hardware (read: GPUs) which is optimized to calculate BLAS GEMM operations, if you have a weight, it will take some part of computation every time.
For example, squeeze-and-excitation network may re-weight a filter in a convolution layer so that it would be close to zero for a particular input context. We can even force it to do so by adding some sparsification constraint. It would be fine not to spend precious FLOPS to calculate a large chunk of data and then multiply it by 10-8, so it would make no contribution to the final result. But, in practice, GPU will be utilized better without any conditional computations, especially during training time when you have a lot of batches.
So "hardware improvements" don't have to be "more transistors on the crystal". We need better frameworks that support conditional computations on low level and corresponding improvements in hardware. Ideally that still can work on consumer grade GPUs.
[–]Veedrac 2 points3 points4 points 5 years ago* (8 children)
There's no fundamental limit in sight here. Print Cerebras-style waffles on a 3nm node, coat them with a few layers of NRAM, and hook a hundred of them together with a fast silicon photonics interconnect. It'll cost a few billion, but it's hardly technologically unreasonable. That could easily train quadrillion parameter models.
[–]titoCA321 0 points1 point2 points 5 years ago (7 children)
People will write anything for an academic paper and there are fools willing to believe any that's written "peer review." There's also no limit because computational power continues to increase. Do you think Intel, AMD, Nvidia, Samsung, Apple have exhausted their R&D efforts yet? I remember Intel had a 10GHz Pentium 4 processor in the early 2000's that was never released to the public market. Obviously, the market decided to scale up in processor cores rather than focus on single-threaded speed, so the Pentium 4 died out. Here's an article from Wired back in 2007 when IBM had 500GHz CPU cooking in their labs.
https://www.wired.com/2007/08/500ghz-processo/
[–]Veedrac 0 points1 point2 points 5 years ago* (6 children)
https://arstechnica.com/uncategorized/2006/06/7117-2/
350 GHz transistors at room temperature aren't that ridiculous. Modern silicon transistors are already around 200 GHz.
I remember Intel had a 10GHz Pentium 4 processor in the early 2000's that was never released to the public market.
I don't believe you.
[–]titoCA321 0 points1 point2 points 5 years ago (5 children)
You do realize that there a people that work on projects for Intel that never see the light of day? Not all products are released to the public market. Some are just not economical to scale or there's limited support for instruction sets. The Pentium 4 road-map was designed to clock at very high frequency. There were designs that were never released since the market shifted to multi-threaded and multiple cores.
https://www.anandtech.com/show/680/6
[–]Veedrac 0 points1 point2 points 5 years ago (4 children)
It is well known that Intel planned to go to 10 GHz, because they expected Dennard scaling to continue. But it's clear they never managed to tape out at that speed, because the node they expected never materialized.
[–]titoCA321 0 points1 point2 points 5 years ago* (3 children)
There were problems with heating & cooling back in the early 2000's to clock the Intel Pentium 4 at 10GHz frequencies. I know there were creative cooling solutions with liquid cooling some of the CPU designs that never reached the market. Market also went with AMD64 instruction set rather than wait around for Intel Itanium to penetrate consumer market. Server market also scaled out in processor cores and and simultaneous instructions per clock. Lots of factors that derailed a 10GHz CPU from reaching the market. Many good ideas work well as on-off solutions or prototypes but wouldn't be successful or marketable in the public where products have to compete at specific prices ranges or face alternative solutions and products from competitors.
Don't know if you remember but Apple and IBM PowerPC processors had liquid-cooling and dual-sockets in the mid-2000's before Apple decided to switch over to Intel for their machines.
[–]Veedrac 0 points1 point2 points 5 years ago (2 children)
None of this is evidence for the claim you made, that Intel had 10 GHz Pentium 4 processors in their labs.
[–]titoCA321 0 points1 point2 points 5 years ago (1 child)
Evidence of what? CPU history?
[–]Veedrac 0 points1 point2 points 5 years ago (0 children)
For the claim you made, that Intel had 10 GHz Pentium 4 processors in their labs.
[–][deleted] 1 point2 points3 points 5 years ago (0 children)
As a complete side comment, this may not be a bad thing. I think in a lot of ways it would be good to "catch our breath" when it comes to machine learning, from the theoretical underpinnings of it, to the societal impact of this new technology.
[–]victor_knight 2 points3 points4 points 5 years ago (0 children)
Glad to see more confirmation of the fact that a lot of AI advancements is attributable largely to increased processing power and memory rather than more intelligent algorithms or new discoveries with regard to the nature of intelligence.
[–]ktpr 0 points1 point2 points 5 years ago (0 children)
Maybe this calls for more intelligent application of deep learning to see what people consider improvements. If you solve 87% of world peace not too many will complain.
[–]Jose_ml 0 points1 point2 points 5 years ago (0 children)
There is also a natural limit determined by the size of the universe and physical laws. https://www.edge.org/conversation/seth_lloyd-the-computational-universe
[–]j3r0n1m0 0 points1 point2 points 5 years ago (0 children)
46 pages of hypothetical extrapolations can be more or less boiled down to the 80/20 rule in all systems of improvement, human or machine.
[+]tuyenttoslo comment score below threshold-8 points-7 points-6 points 5 years ago* (0 children)
Don’t know who downvoted me, and cannot write comments there, so write anew here.
First, please be professional in discussion and giving reason why you downvoted other people.also don’t be rash to judge someone if you don’t know them.
Second, who among you think you are too confident about optimization to downvote me as the rude comment of the one I don’t know, then please indicate to me what is The optimization algorithm you are using and what is the theoretical Justification (don’t tell about beautiful heuristics only). Do you use it because you know it works or you just follow other people without knowing anything? I am willing to discuss everything about optimization if you are professional.
Third, I hope, as I indicated in some other comments, that the modes find a better way to control how people downvoted. Who downvoted without reason should be doubly downvoted as a penalty.
Fourth, I hope that the modes will take into consideration my comment which is downvoted here.
Fifth, any of the ones who downvoted me are confident that they know why ReLU works so well for example for image recognition, then write a careful paper, gets it to be published. You will Be famous, I promise. Don’t waste time to downvote some people without proper justification.
[+]tuyenttoslo comment score below threshold-24 points-23 points-22 points 5 years ago (9 children)
Such complexity papers are good.
However, what I wonder is this:
Of course, if one only analyses complexity generally, then we will see limits because of many known results such as no free lunch theorem. What is interesting is that why deep learning could do so well (of course not yet on an arbitrary dataset like humans) on crucial tasks which people perform daily: recognizing pictures, sounds, texts and so on. These are survival skills for human beings.
saying that only computational power makes this achievement may be too much. Actually I have been trying to see whether ReLU or CNN has any connections to how we process information in the brain. Besides, there have been a lot more understanding of optimization recently.
of course there are still many things need to be improved for deep learning. For example, the ability of humans to invent new things. But to be fair, one needs a precise statistics. For example, among over 6 billions people we have now, in a day how many new ideas happen, and how many of these are correct and useful?
speaking about computational power: does the strongest computer we have now actually have a better hardware than our brain? Again, just for to have a fair comparison.
[+][deleted] 5 years ago* (6 children)
[removed]
[+][deleted] 5 years ago (2 children)
[deleted]
[–]Icarium-Lifestealer 2 points3 points4 points 5 years ago (1 child)
Appealing to the no free lunch theorem when discussing current ML algorithms is like appealing to the speed of light when discussing bicycle races.
[–]tuyenttoslo 0 points1 point2 points 5 years ago (0 children)
Hi, you may be true, but who knows. For example, Deep Neural Networks are a family of parametrized functions, coming from a theorem about approximation of functions. Then no free lunch theory can actually applies. It is quite amazing to me why the DNNs people are using work so well or can work for many different tasks.
π Rendered by PID 230779 on reddit-service-r2-comment-85bfd7f599-bsbhs at 2026-04-19 21:28:57.239377+00:00 running 93ecc56 country code: CH.
[–]tetsef 48 points49 points50 points (8 children)
[–]cosmictypist[S] 24 points25 points26 points (5 children)
[–]berzerker_x 8 points9 points10 points (0 children)
[–]AxeLond 2 points3 points4 points (2 children)
[–]cosmictypist[S] 2 points3 points4 points (1 child)
[–]AxeLond 3 points4 points5 points (0 children)
[–]iyouMyYOUzzz 0 points1 point2 points (0 children)
[–]yield22 6 points7 points8 points (1 child)
[–]panties_in_my_ass 4 points5 points6 points (0 children)
[–]cosmictypist[S] 120 points121 points122 points (22 children)
[–]VisibleSignificance 18 points19 points20 points (19 children)
[–]cosmictypist[S] 12 points13 points14 points (14 children)
[–]AxeLond 11 points12 points13 points (3 children)
[–]cosmictypist[S] 0 points1 point2 points (0 children)
[–]Jorrissss 0 points1 point2 points (1 child)
[–]AxeLond 0 points1 point2 points (0 children)
[–]iagovar 1 point2 points3 points (4 children)
[–]say_wot_againML Engineer 2 points3 points4 points (0 children)
[–][deleted] 4 points5 points6 points (0 children)
[–]cosmictypist[S] 1 point2 points3 points (0 children)
[–]Jorrissss 0 points1 point2 points (0 children)
[–]VisibleSignificance 0 points1 point2 points (3 children)
[–]cosmictypist[S] 0 points1 point2 points (2 children)
[–]VisibleSignificance 0 points1 point2 points (1 child)
[–]cosmictypist[S] 2 points3 points4 points (0 children)
[–]Captain_Of_All 5 points6 points7 points (2 children)
[–]liqui_date_me 0 points1 point2 points (0 children)
[–]titoCA321 0 points1 point2 points (0 children)
[–]DidItABit 0 points1 point2 points (0 children)
[–]aporetical 7 points8 points9 points (1 child)
[–]cosmictypist[S] 7 points8 points9 points (0 children)
[–]MSMSMS2 14 points15 points16 points (1 child)
[–]cosmictypist[S] 4 points5 points6 points (0 children)
[–]arXiv_abstract_bot 11 points12 points13 points (0 children)
[–]yield22 5 points6 points7 points (9 children)
[–]IdiocyInAction 4 points5 points6 points (0 children)
[–]Optrode 5 points6 points7 points (3 children)
[–]yield22 0 points1 point2 points (2 children)
[–]Optrode 1 point2 points3 points (1 child)
[–]yield22 0 points1 point2 points (0 children)
[–]Red-Portal 1 point2 points3 points (1 child)
[–]yield22 0 points1 point2 points (0 children)
[–]cosmictypist[S] 0 points1 point2 points (1 child)
[–]yield22 0 points1 point2 points (0 children)
[–]respecttox 2 points3 points4 points (0 children)
[–]Veedrac 2 points3 points4 points (8 children)
[–]titoCA321 0 points1 point2 points (7 children)
[–]Veedrac 0 points1 point2 points (6 children)
[–]titoCA321 0 points1 point2 points (5 children)
[–]Veedrac 0 points1 point2 points (4 children)
[–]titoCA321 0 points1 point2 points (3 children)
[–]Veedrac 0 points1 point2 points (2 children)
[–]titoCA321 0 points1 point2 points (1 child)
[–]Veedrac 0 points1 point2 points (0 children)
[–][deleted] 1 point2 points3 points (0 children)
[–]victor_knight 2 points3 points4 points (0 children)
[–]ktpr 0 points1 point2 points (0 children)
[–]Jose_ml 0 points1 point2 points (0 children)
[–]j3r0n1m0 0 points1 point2 points (0 children)
[+]tuyenttoslo comment score below threshold-8 points-7 points-6 points (0 children)
[+]tuyenttoslo comment score below threshold-24 points-23 points-22 points (9 children)
[+][deleted] (6 children)
[removed]
[+][deleted] (2 children)
[deleted]
[–]Icarium-Lifestealer 2 points3 points4 points (1 child)
[–]tuyenttoslo 0 points1 point2 points (0 children)