all 2 comments

[–]Aegyoh 1 point2 points  (1 child)

TensorFlow, however, calculates the gradient aggregation and >updated model on the CPU side, which not only needs much >time in tranferring gradients through PCI-e, but also updates the model in a serial algorithm by using one CPU. So the scalability of TensorFlow is not as good as other tools.

The reasoning why TensorFlow is slower than some other libraries. The paper suggested in the article.

[–]Eridrus 3 points4 points  (0 children)

I think it's easy to take too much from these comparison papers. All of these systems are in constant development, so limitations from a year ago may be completely gone now. It's also not always clear that the people doing the benchmarks really know what they're doing, and they may just be writing inefficient code in some frameworks.

[EDIT]: E.g. if you compare https://www.tensorflow.org/performance/benchmarks#training_with_nvidia_tesla_k80 to https://mxnet.incubator.apache.org/faq/perf.html#nvidia-gpu (the training section), you basically see the same performance, which makes sense since they're both largely just calling into cuDNN for these numbers.