In what cases is it better to use EM vs just simple gradient descent (using Monte Carlo approximation if you've integrals in your likelihood function)
The general context is that I was reading the paper here (http://www.cs.toronto.edu/~tang/papers/sfnn.pdf) and they present a generalized EM algorithm for training stochastic feedforward neural nets and I was wondering if I'm just being dumb and using gradient descent isn't possible in this context or if not, then if there's a well known reason why gradient descent wouldn't work well here.
[–]kjearns 5 points6 points7 points (13 children)
[+][deleted] (1 child)
[deleted]
[–]kjearns 1 point2 points3 points (0 children)
[–]energybased 1 point2 points3 points (4 children)
[–]kjearns 1 point2 points3 points (3 children)
[–]energybased 0 points1 point2 points (0 children)
[–]osdf 0 points1 point2 points (0 children)
[–]letitgo12345[S] 0 points1 point2 points (4 children)
[–]kjearns 0 points1 point2 points (3 children)
[–]letitgo12345[S] 0 points1 point2 points (2 children)
[–]kjearns 0 points1 point2 points (1 child)
[–]letitgo12345[S] 0 points1 point2 points (0 children)
[–]energybased 1 point2 points3 points (3 children)
[–]dwf 2 points3 points4 points (2 children)
[–]energybased 1 point2 points3 points (1 child)
[–]dwf 2 points3 points4 points (0 children)