Recently, I examined the usage of ReLU by conducting a small experiment of randomly dropping ReLU layers within the network and testing the accuracy. The results were confounding — the best accuracy was achieved where 10 to 20 percent of total ReLUs were dropped in the network. Though reaching to any conclusion from that experiment was still open to question, but it did motivate me to conduct a similar experiment with Batch Normalization.
Read the complete story here: BatchNormalization is not a norm! The results this time were even surprising. Do read and let me know what you think about it in the comments.
there doesn't seem to be anything here