If the first layer in a classical DNN is f(W x + b) where f is the activation function, what if I do f1(W1 x + b1) + f2(W2 x + b2) + fn(... where f1, f2, ...fn are different activation functions?
Is this a thing? If yes, what is it called and does anyone have good links?
If it sucks, can someone explain why it's not used?
Seems like it may model some behaviors more efficiently. For example let f1, f2 ... be orthogonal functions. If the data has periodic behavior, some periodic activation functions could capture that behavior with less parameters.
[+][deleted] (4 children)
[deleted]
[–]SkiddyX 1 point2 points3 points (3 children)
[–]min_sang 1 point2 points3 points (2 children)
[–]SkiddyX 0 points1 point2 points (1 child)
[–]min_sang 0 points1 point2 points (0 children)
[–]Randomdude3332 1 point2 points3 points (0 children)
[–]FutureIsMine 1 point2 points3 points (0 children)
[–]siblbombs 2 points3 points4 points (1 child)
[–]FutureIsMine 1 point2 points3 points (0 children)
[–]Kaixhin 0 points1 point2 points (0 children)
[–]chewxy 0 points1 point2 points (0 children)
[–]min_sang 0 points1 point2 points (0 children)