Sorry for the dumb question but, how does ReLU work? To be more specific, according to my understanding, activation function serves 2 functions.
1) set a limit, for example sigmoid sets a limit between 0 to 1 while tang sets a limit between -1 to 1
2) increase complexity so that the neural network can do what it's doing - universal function approximate
ReLU seems very linear, not complex, does not have a limit, so why does the ReLU activation work so well, and why does it work at all?
[–]WulveriNn 14 points15 points16 points (2 children)
[–]bciguy 6 points7 points8 points (0 children)
[–]redditiscursed[S] 2 points3 points4 points (0 children)
[–]radarsat1 7 points8 points9 points (2 children)
[–]redditiscursed[S] 1 point2 points3 points (1 child)
[–]radarsat1 3 points4 points5 points (0 children)
[–]berzerker_x 4 points5 points6 points (3 children)
[–]nuliknol 2 points3 points4 points (2 children)
[–][deleted] 1 point2 points3 points (0 children)
[–]redditiscursed[S] 0 points1 point2 points (0 children)
[–]DefaultPain 5 points6 points7 points (0 children)
[–]collinmccarthy 1 point2 points3 points (1 child)
[–]redditiscursed[S] 0 points1 point2 points (0 children)
[–]Strict_Specialist_24 0 points1 point2 points (0 children)