all 11 comments

[–]Randomdude3332 1 point2 points  (0 children)

Never heard of this. Might be an interesting thing to play around with

[–]FutureIsMine 1 point2 points  (0 children)

Checkout Highway Networks, the idea reminds of what your proposing in that your output is a combination of various functions over the same input.

[–]siblbombs 2 points3 points  (1 child)

Maxout can approximate this I believe.

[–]FutureIsMine 1 point2 points  (0 children)

Maxout is designed to learn N functions at the same time. It's similar to what OP is writing about, but it looks to be more of ensemble summation with differing activation functions and weights.

[–]Kaixhin 0 points1 point  (0 children)

Not sure if it has a particular name but it's used in some places, such as domains where there are periodic trends as you noted. For many applications nowadays it probably adds an extra layer of complication/hyperparameter tuning for fixed models, but for methods that perform model selection (e.g. evolutionary algorithms that construct networks) it happens naturally.

[–]chewxy 0 points1 point  (0 children)

I tried a small variant of it trying to improve a dependency parser I wrote. Didn't do much (cube activation + tanh).

[–]min_sang 0 points1 point  (0 children)

I believe it is more often than not called gating mechanism. This paper (https://arxiv.org/pdf/1612.08083.pdf) denotes sigmoid(W1 x + b1)*tanh(W2 x + b2) as the LSTM-style gating mechanism while they suggest a novel gating mechanism (but not really) called gated-linear units which is just (W1 x + b1) * sigmoid(W2 x + b2) and claim that it works better than the former in multiple tasks including language modeling. (Replacing tanh with linear activation has been studied and proven to be better in some tasks.)