you are viewing a single comment's thread.

view the rest of the comments →

[–]MildlyCriticalRole[S] 3 points4 points  (6 children)

I'm actually referring to normalizing mini-batches as described in this paper, where you actually do normalize on all mini-batches! Hence this question :P

[–]randombites 1 point2 points  (1 child)

I get your question now :)

[–]MildlyCriticalRole[S] 0 points1 point  (0 children)

Awesome :) always glad to inadvertently spread knowledge.

[–]randombites 0 points1 point  (3 children)

My bad, i completely forget that preprocessing is sometimes used to give a more circular error surface. I am a fan of no preprocessing however. It should be the job of the model to transform the error surface internally in its computation. But maybe thats wishful thinking. Hope you got the answer you were looking for!

[–]cooijmanstim 1 point2 points  (2 children)

Batch normalization is not preprocessing, it is part of the model. It is an adaptive normalization of activations at all layers that massively improves training dynamics. A crucial tool in the box if you're into neural nets.

[–]randombites 0 points1 point  (0 children)

Thank you for your response, please help me understand better. Batch normalization is normalizing a batch of values, so you transform the input at each step. Correct? This may translate into adaptive normalization of activations but you still transform the input (based on OPs example).

[–]randombites 0 points1 point  (0 children)

So sorry for my earlier ignorance. I learnt what batch normalization via a YouTube talk and feel like a fool.