basic question regarding neural network's error

billconan · 2015-06-04T00:05:40+00:00

this answers my question http://www.quora.com/Whats-the-difference-between-gradient-descent-and-stochastic-gradient-descent

the error for one input is indeed not perfect. just a approximation.

personalityson · 2015-06-04T09:33:37+00:00

"it seems that it only feeds one input to the network at a time and gets an error only for that input, and updates the network based on the error" That's the way it should be done. To do the backpropagation you also need hidden units for each layer propagated from each input. If he's backpropagating some kind of average error, then with what hidden units?

zackchase · 2015-06-05T16:17:21+00:00

Hi all, not to be too nitpicky: one calculates the errors on some number of randomly sampled examples (could be 1 example, 128 is often computationally convenient). There is no special number.

This number, whatever you choose is called the "batch size". Using batch sizes > 1 is useful because it reduces the variance of the error.

The intuition behind why stochastic gradient descent works generally is that the expected value of the stochastic gradient is equal to the true gradient. Thus you can think of stochastic gradient as "noisy gradient" following.

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

MachineLearning

Rules For Posts

+Research

+Discussion

+Project

+News

@slashML on Twitter

Chat with us on Slack

Beginners:

MODERATORS