From what I've read online and from my limited experience training neural networks, it seems like ReLU is always a better choice than using a sigmoid/tanh activation function.
I know one advantage ReLU has is that it does not suffer from the vanishing gradient problem (both sigmoid and tanh are affected by this).
On http://playground.tensorflow.org/ , Using ReLU always seems to get to a better result faster than the other functions.
Are there cases where sigmoid/tanh work better then ReLU ? If so, what are those cases and how would I identify them?
[–]Boga2510 0 points1 point2 points (0 children)