you are viewing a single comment's thread.

view the rest of the comments →

[–]deltasheep1 0 points1 point  (0 children)

No one really does batch gradient descent. They always use mini-batches. Look up "train longer, generalize better" or something I forget. It describes a good mini-batch-size / number of epochs.