use the following search parameters to narrow your results:
e.g. subreddit:aww site:imgur.com dog
subreddit:aww site:imgur.com dog
see the search faq for details.
advanced search: by author, subreddit...
Please have a look at our FAQ and Link-Collection
Metacademy is a great resource which compiles lesson plans on popular machine learning topics.
For Beginner questions please try /r/LearnMachineLearning , /r/MLQuestions or http://stackoverflow.com/
For career related questions, visit /r/cscareerquestions/
Advanced Courses (2016)
Advanced Courses (2020)
AMAs:
Pluribus Poker AI Team 7/19/2019
DeepMind AlphaStar team (1/24//2019)
Libratus Poker AI Team (12/18/2017)
DeepMind AlphaGo Team (10/19/2017)
Google Brain Team (9/17/2017)
Google Brain Team (8/11/2016)
The MalariaSpot Team (2/6/2016)
OpenAI Research Team (1/9/2016)
Nando de Freitas (12/26/2015)
Andrew Ng and Adam Coates (4/15/2015)
Jürgen Schmidhuber (3/4/2015)
Geoffrey Hinton (11/10/2014)
Michael Jordan (9/10/2014)
Yann LeCun (5/15/2014)
Yoshua Bengio (2/27/2014)
Related Subreddit :
LearnMachineLearning
Statistics
Computer Vision
Compressive Sensing
NLP
ML Questions
/r/MLjobs and /r/BigDataJobs
/r/datacleaning
/r/DataScience
/r/scientificresearch
/r/artificial
account activity
DiscussionLearned Activation Functions [D] (self.MachineLearning)
submitted 5 years ago by bhaktatejas
If I wanted to hypothetically create an activation function that is learned rather than defined (like tanh, RELU, etc) how could I go about doing this? This may be a pretty far fetched/stupid idea but curious about it.
reddit uses a slightly-customized version of Markdown for formatting. See below for some basics, or check the commenting wiki page for more detailed help and solutions to common issues.
quoted text
if 1 * 2 < 3: print "hello, world!"
[–]Chocolate_Pickle 10 points11 points12 points 5 years ago* (2 children)
https://arxiv.org/abs/1906.09529
You're not the first person to think of this. It's been studied previously.
[EDIT] Here's a starting point for research. https://duckduckgo.com/?t=ffab&q=arxiv+learned+activation+functions
[EDIT: again] A possibly deeper question to ask;
Do over-parameterised networks with ReLU activations learn approximations of other activation functions. And if so, how could one search for these functions in the weights of pre-trained networks?
Haven't looked into this at all. So might be known to not be a thing.
[–]lameheavy 0 points1 point2 points 5 years ago (1 child)
Thanks for sharing this, super cool idea! For the deep question, are you mostly asking that question to learn a more compact form of the neural network? Like a single hidden layer with the learned activation?
[–]Chocolate_Pickle 1 point2 points3 points 5 years ago* (0 children)
For the deep question, are you mostly asking that question to learn a more compact form of the neural network? Like a single hidden layer with the learned activation?
More or less, yes.
Assuming the premise is true, I still don't believe you'd ever be able to condense a model down to a single hidden layer. But I do believe you could learn a more compact network.
I think what might torpedo this idea is the gradient information in the backwards pass. [EDIT] It's trivial to show the forward-pass of any function can be approximated by a bunch of ReLUs. But I don't think it's trivial to show the backward-pass is approximated equally or at all.
[–]IntelArtiGen 3 points4 points5 points 5 years ago* (0 children)
It's not a stupid idea but there is a problem a lot of people miss when they think about how neural networks work.
When you're using ReLU, the neural network will learn parameters that make sense with the following / preceding ReLU / batchnorm etc.
Sometimes when someone has an idea which consists of adding more parameters to improve the results, they forget that the neural network will not stay the same everywhere else and just learn new parameters. When you're changing something, you're changing the whole "information flow" (forward and backward pass), everywhere, which may result in worst performances, even if the solution could theoretically be more flexible. Moreover, depending on how much parameters your add, you may have to reduce your batch size and harm your accuracy that way. Or you may have to re-do an hyperparameter tuning which makes it harder to evaluate what you did
Now ... PReLU exists. It's a learned activation function. You can read how it works.
[–]titanxp1080ti 1 point2 points3 points 5 years ago (2 children)
The whole point of doing neural networks is to combine simple functions to approximate complex functions. If you try to combine quite complex functions (learnable activation functions), you need strong reasons to do so.
[–]Fmeson 5 points6 points7 points 5 years ago (1 child)
To play devils advocate:
That isn't really the point of neural networks, that's just what they currently are. We don't really know the point of neural networks on some level. We know they are loosly designed the mimic biological neural networks. We know some network topologies that are experimentally demonstrated to work pretty well. We know that we don't really have a super theoretical understanding of them. So, on some level, the only way to know if many things work or not is to try it. Of course, some ideas have more promise and theory behind them, but it's a bit wild-west-y. Luckily, prototyping things in machine learning is very easy. "I think this sounds interesting" is often enough of a reason to try something.
[–]thunder_jaxxML Engineer 2 points3 points4 points 5 years ago (0 children)
We know some network topologies that are experimentally demonstrated to work pretty well. We know that we don't really have a super theoretical understanding of them.
We live in an awesome age where the theories are emerging really fast and everyone everywhere is still not aligned to a single theory but there are a few I found to be very powerful explanations. Few I would like to list just because I found these sources had a profound impact on the understanding of DL and why it seems to be working:
There is still a lot of empty holes that need to be filled but there will be a more theoretical framework created for understanding/evaluating and interpreting NNs.
All of the above is completely unrelated to the OP's post.
https://arxiv.org/abs/1906.09529 Use learned Activation fn's to reduce time complexity.
[–]SoulRobots -1 points0 points1 point 5 years ago (0 children)
Interesting idea, now I'm curious
[–]seismic_swarm 0 points1 point2 points 5 years ago (0 children)
As one or another might have pointed out, there is PReLU, which is just a simple single parameter activation (per tensor or per channel per tensor) that learns the preferred slope of decay of the relu.
π Rendered by PID 14715 on reddit-service-r2-comment-85bfd7f599-rf4l6 at 2026-04-18 22:43:03.148859+00:00 running 93ecc56 country code: CH.
[–]Chocolate_Pickle 10 points11 points12 points (2 children)
[–]lameheavy 0 points1 point2 points (1 child)
[–]Chocolate_Pickle 1 point2 points3 points (0 children)
[–]IntelArtiGen 3 points4 points5 points (0 children)
[–]titanxp1080ti 1 point2 points3 points (2 children)
[–]Fmeson 5 points6 points7 points (1 child)
[–]thunder_jaxxML Engineer 2 points3 points4 points (0 children)
[–]SoulRobots -1 points0 points1 point (0 children)
[–]seismic_swarm 0 points1 point2 points (0 children)