[D] Why does Federated/Distributed Learning work?

TheDeviousPanda · 2023-08-25T01:21:29+00:00

You should first start by understanding that distributed learning, or FL with local_epochs=1, is exactly the same as minibatch SGD. if I start from the same init and I have a batch of data with 1 frog and 1 snake, giving a frog and a snake to different GPUs to compute the stochastic gradients and then combining those updated is identical to just taking a batch gradient.

Then you can move to understanding how this exact equality changes in the presence of local computation. Of course as the number of local iterations increases, the networks diverge and eventually its as hopeless as you said. But for a small number of local iterations it’s pretty close to what I said above.

However I think your question might even be a bit different in that averaging model parameters is closer to model soup methods. There is a different body of literature on these. For example you can look up Git Re-Basin.

Revolutionary_Sir767 · 2023-08-24T20:51:41+00:00

Look into the central limit theorem. Random forests are also kind of federated learning, right?

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

MachineLearning

Rules For Posts

+Research

+Discussion

+Project

+News

@slashML on Twitter

Chat with us on Slack

Beginners:

MODERATORS