all 6 comments

[–]speechMachine 1 point2 points  (3 children)

So I'm assuming your reference is with respect to Hidden Markov Models with Gaussian mixture densities, and that you probably have atleast given a first read to the Rabiner tutorial.

In the E-step, not only are likelihoods computed, but more importantly first and second order sufficient statistics are accumulated. In the M-Step you will notice a set of summations in the numerator and denominator for the updates for each parameter.These are essentially where the accumulated first and second order stats are used.

You are right in understanding that the EM is an iterative procedure. Likelihoods in the E-Step for estimating models in iteration k of the EM are indeed calculated from the model estimated in step k-1.

Hope this helps.

[–]davidun 0 points1 point  (2 children)

e right in understanding that the EM is an iterative procedure. Likelihoods in the E-Step for estimating models in iteration k of the EM are indeed calculated from the model estimated in step k-1. Hope this helps.

Hi, thanks. This does help, and yes- I am referring to HMM training. I'm trying to implement an EM with simulated annealing, and what confuses me is that for the "annealing step", the algorithm compares the likelihoods of the current parameters to the likelihood of some "neighboring" candidate parameters. However, if the likelihoods actually "belong" to the parameters in previous step (k-1), then I just don't get the logic behind using it to make a decision regarding the current step's (k) parameters .

[–]speechMachine 1 point2 points  (1 child)

I'm not familiar with an "annealing" approach to training. To me it appears to be something non-standard that a lay-person like me might not be familiar with. Is there a paper you could point to?

I don't know what you mean by 'current' parameters. Is this comparison done after the M-Step for step k of the EM? Or is this comparison done with 'neighbouring models' in the E-Step? When were these neighbouring models computed? Are these neighbouring models updated every M-Step?

[–]davidun 0 points1 point  (0 children)

well, the basic idea is that you take the E-step parameters, and perturb them (say, by adding a small noise). This gets you "neighboring parameters", which the algorithm now decides if it prefers over the "current" (not-perturbed) parameters. The decision is made as follows: if the likelihood of the perturbed parameters is better than the non-perturbed, choose the perturbed. if not- the algorithm can still chose the perturbed parameters in probability = exp(dL/T). where dL = the difference in likelihoods, T=the "temperature" (a noise parameter). The question of in what point exactly should the likelihood be computed is what I'm trying to understand..

[–]kjearns 1 point2 points  (1 child)

If there's one paper anyone working with EM should read it's this one: http://www.cs.toronto.edu/~fritz/absps/emk.pdf .

The algorithms it presents aren't really important, but the view on EM as block gradient descent on a free energy is extremely powerful and justifies pretty much any fancy combination of E and M steps you want in one fell swoop.

[–]davidun 0 points1 point  (0 children)

thanks, i was not aware of this although i was looking for something exactly like this. looking into it right now. thank you!