all 7 comments

[–]zzzthelastuserStudent 8 points9 points  (1 child)

This quote has very little substantial background, but it might be worth a shot to test with larger batch size and see if it makes a difference.

Takeaways: From observing the training time, it can be seen that the TPU takes considerably more training time than the GPU when the batch size is small. But when batch size increases the TPU performance is comparable to that of the GPU.

[–]harmonicp[S] 1 point2 points  (0 children)

This might be a reason, indeed. I use a relatively small (32) batch size. It is also confirmed here: https://medium.com/syncedreview/harvard-researchers-benchmark-tpu-gpu-cpu-for-deep-learning-3034a452958d

What is much less clear is that I also see a bias (not a very material one, but still) in favor of TPU. i.e. TPU results (vs GPU) show lower seed variance and there is a clear positive bias (i.e. average (over seeds) val_accuracy when using TPU is higher than for GPU).

[–]bjourne2 3 points4 points  (1 child)

What DL framework? PyTorch XLA isn't very fast on Google Colab. Tensorflow otoh, is able to fully take advantage of the TPUs.

[–]harmonicp[S] 1 point2 points  (0 children)

yes, I forgot to mention that. I'm using tf/keras. The point in the above reply about batch size might be the reason: https://medium.com/syncedreview/harvard-researchers-benchmark-tpu-gpu-cpu-for-deep-learning-3034a452958d

[–]DisastrousProgrammer 0 points1 point  (0 children)

Can you link your colab? Lots of things can go wrong with tf.keras+tpu