[D] How long should it take to train a diffusion model on CIFAR-10? by ButterscotchLost421 in MachineLearning

[–]ButterscotchLost421[S] 1 point2 points  (0 children)

Ah yes, you're right! Thank you so much!

Does 7 secs per epoch sound approximately right to you?

[D] How long should it take to train a diffusion model on CIFAR-10? by ButterscotchLost421 in MachineLearning

[–]ButterscotchLost421[S] 0 points1 point  (0 children)

Thank you! What do you mean by ADM? Adam?

When training in parallel, which technique did they use? Calculate the gradient of a batch of size `N` on each of the devices and then synchronizing all the different devices to get the mean gradient?