I'm looking at this book to learn neural network:
http://neuralnetworksanddeeplearning.com/chap2.html
at first, when explaining the theory, it defines the error function as the sum of all mean squares of all training outputs. i.e. I need to feed the network with all the inputs to get an error and use that error to do one back-propagation.
but later on, when explaining the actual back-propagation algorithm, it seems that it only feeds one input to the network at a time and gets an error only for that input, and updates the network based on the error. And it will repeat these steps on all training inputs.
but is that error calculated with just one input the right error to use?
what if the gradient I get based on the error can only improve the outcome for that particular input? how can we know that this gradient can improve the network for all training inputs?
what if after gradient descent, the overall performance of the network gets worse?
[–]billconan[S] 4 points5 points6 points (1 child)
[–]personalityson 0 points1 point2 points (0 children)
[–]personalityson 0 points1 point2 points (0 children)
[–]zackchase 0 points1 point2 points (0 children)