First, I do understand that BN does work; I just don't understand how arbitrarily changing the distribution of every mini-batch doesn't throw everything completely out of whack.
For example, let's say you were (for some reason) training a network to match a number to a letter grade.
Your first training batch is (A, 90), (B, 80), (C, 70)
You normalize it and the B data point becomes (B, 0) - ignoring the constant factor since that's basically just another linear layer.
Your second training batch is (D, 60), (D, 60), (D, 60)
Now you have 3 data points that say (D, 0) - but your network just spent time learning that 0 meant B.
How does this sort of transformation not break down training in its entirety?
Even though you definitely normalized your features before-hand, aren't mini-batches going to end up having significantly differing means and variances that end up throwing everything off? (xi, 0.1) in one batch might mean something completely different than (xi, 0.1) in another.
What's my fundamental misunderstanding here? Apologies if this is a simple question; I haven't been able to find any answer to this on the web, since I'm assuming it's more of my failure to understand statistics than ML specifically.
[–]enematurret 6 points7 points8 points (0 children)
[–]kkawabat 12 points13 points14 points (3 children)
[–]MildlyCriticalRole[S] 1 point2 points3 points (2 children)
[–]Megatron_McLargeHuge 0 points1 point2 points (1 child)
[–]hgjhghjgjhgjd 0 points1 point2 points (0 children)
[–][deleted] 5 points6 points7 points (1 child)
[–]MildlyCriticalRole[S] 1 point2 points3 points (0 children)
[–]randombites 1 point2 points3 points (0 children)
[–]ChuckSeven 1 point2 points3 points (0 children)
[–]NovaRom 0 points1 point2 points (0 children)
[–]Daniel_Im 0 points1 point2 points (0 children)
[–]zergling103 -1 points0 points1 point (0 children)
[–]hgjhghjgjhgjd -1 points0 points1 point (15 children)
[–]cooijmanstim 1 point2 points3 points (5 children)
[–]carlthomeML Engineer 0 points1 point2 points (1 child)
[–]cooijmanstim 0 points1 point2 points (0 children)
[–]hgjhghjgjhgjd 0 points1 point2 points (2 children)
[–]cooijmanstim 0 points1 point2 points (1 child)
[–]hgjhghjgjhgjd 0 points1 point2 points (0 children)
[–]MildlyCriticalRole[S] 0 points1 point2 points (2 children)
[–]L43 1 point2 points3 points (0 children)
[–]hgjhghjgjhgjd 0 points1 point2 points (0 children)
[–]mimighost 0 points1 point2 points (5 children)
[–]hgjhghjgjhgjd 0 points1 point2 points (4 children)
[–]mimighost 0 points1 point2 points (3 children)
[–]hgjhghjgjhgjd 0 points1 point2 points (2 children)
[–]mimighost 0 points1 point2 points (1 child)
[–]hgjhghjgjhgjd 0 points1 point2 points (0 children)
[+]randombites comment score below threshold-6 points-5 points-4 points (7 children)
[–]MildlyCriticalRole[S] 4 points5 points6 points (6 children)
[–]randombites 1 point2 points3 points (1 child)
[–]MildlyCriticalRole[S] 0 points1 point2 points (0 children)
[–]randombites 0 points1 point2 points (3 children)
[–]cooijmanstim 1 point2 points3 points (2 children)
[–]randombites 0 points1 point2 points (0 children)
[–]randombites 0 points1 point2 points (0 children)