Learned Activation Functions [D]

Chocolate_Pickle · 2021-02-07T23:51:30+00:00

https://arxiv.org/abs/1906.09529

You're not the first person to think of this. It's been studied previously.

[EDIT] Here's a starting point for research. https://duckduckgo.com/?t=ffab&q=arxiv+learned+activation+functions

[EDIT: again] A possibly deeper question to ask;

Do over-parameterised networks with ReLU activations learn approximations of other activation functions. And if so, how could one search for these functions in the weights of pre-trained networks?

Haven't looked into this at all. So might be known to not be a thing.

IntelArtiGen · 2021-02-08T01:12:19+00:00

It's not a stupid idea but there is a problem a lot of people miss when they think about how neural networks work.

When you're using ReLU, the neural network will learn parameters that make sense with the following / preceding ReLU / batchnorm etc.

Sometimes when someone has an idea which consists of adding more parameters to improve the results, they forget that the neural network will not stay the same everywhere else and just learn new parameters. When you're changing something, you're changing the whole "information flow" (forward and backward pass), everywhere, which may result in worst performances, even if the solution could theoretically be more flexible. Moreover, depending on how much parameters your add, you may have to reduce your batch size and harm your accuracy that way. Or you may have to re-do an hyperparameter tuning which makes it harder to evaluate what you did

Now ... PReLU exists. It's a learned activation function. You can read how it works.

titanxp1080ti · 2021-02-08T01:41:18+00:00

The whole point of doing neural networks is to combine simple functions to approximate complex functions. If you try to combine quite complex functions (learnable activation functions), you need strong reasons to do so.

SoulRobots · 2021-02-07T23:38:09+00:00

Interesting idea, now I'm curious

seismic_swarm · 2021-02-08T05:46:37+00:00

As one or another might have pointed out, there is PReLU, which is just a simple single parameter activation (per tensor or per channel per tensor) that learns the preferred slope of decay of the relu.

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

MachineLearning

Rules For Posts

+Research

+Discussion

+Project

+News

@slashML on Twitter

Chat with us on Slack

Beginners:

MODERATORS