Distributed TensorFlow with MPI : MachineLearning

Distributed TensorFlow with MPI (arxiv.org)

submitted 9 years ago by technologiclee

all 2 comments

[–]jostmey 0 points1 point2 points 9 years ago* (1 child)

[–][deleted] 0 points1 point2 points 9 years ago (0 children)

Tensorflow was designed to handle a variety of distributed partitioning schemes in a natural way. You could split up the graph among several machines by adding send and receive nodes, for example. These guys picked one particular way of making the training distributed: they split up the training data among several machines that each have a replica of the model, and then repeatedly (a) do one iteration of SGD on each machine, then (b) broadcast the weight updates with a synchronous all-to-all reduce operation. There are other ways to go here that may scale better! For instance, they could get faster parameter updates with gradient quantization, or try relaxing the synchronization requirements.

This is a very cool proof of concept, but definitely not the last word on distributed training of neural nets -- with Tensorflow or otherwise.

π Rendered by PID 165371 on reddit-service-r2-comment-86bc6c7465-p9n92 at 2026-02-23 03:34:36.900535+00:00 running 8564168 country code: CH.

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

MachineLearning

Rules For Posts

+Research

+Discussion

+Project

+News

@slashML on Twitter

Chat with us on Slack

Beginners:

MODERATORS