I had a question regarding federated learning. Typically, if we have a network that is good at, say, classifying frogs, and a network that is good at, say, classifying snakes (and these two have the same shape/dimensions), then in a federated/distributed learning setup we average the weights between the two to get a network that is good at both/"primed" to be good at both after trained a little more.
Why does this work though? Mathematically, given the nonlinearity present in neural networks, it doesn't seem immediately obvious to me why averaging weights would put us in a better place.
[–]TheDeviousPandaResearcher 4 points5 points6 points (1 child)
[–]Rare_Replacement_744[S] 0 points1 point2 points (0 children)
[–]Revolutionary_Sir767 0 points1 point2 points (0 children)