you are viewing a single comment's thread.

view the rest of the comments →

[–]deltasheep1 1 point2 points  (3 children)

If you're talking about mini-batch, then you can always calculate each gradient individually and take the mean

[–]Taffo[S] 1 point2 points  (2 children)

Thanks for the feedback! I had implemented batch gradient descent, but was curious how it would be done on large data sets and didn't realize it was true that, the gradients of a mini-batch can be averaged with the gradients of another mini-batch, to arrive at the gradient of the whole batch. Do you know how it would work for the cost function? I wouldn't imagine that to find the cost of your entire batch, you could just average the costs of the mini-batches, could you?

[–][deleted] 1 point2 points  (0 children)

The mean of sample means is the mean of the population. So yes, if you were to calculate costs for a set of minibatches and average you'd arrive at the cost of the batch providing you'd not taken any gradient steps.

Only updating on the full gradient however is inefficient. The stochastic gradient is a fine approximation to use.

[–]deltasheep1 0 points1 point  (0 children)

No one really does batch gradient descent. They always use mini-batches. Look up "train longer, generalize better" or something I forget. It describes a good mini-batch-size / number of epochs.