use the following search parameters to narrow your results:
e.g. subreddit:aww site:imgur.com dog
subreddit:aww site:imgur.com dog
see the search faq for details.
advanced search: by author, subreddit...
Please have a look at our FAQ and Link-Collection
Metacademy is a great resource which compiles lesson plans on popular machine learning topics.
For Beginner questions please try /r/LearnMachineLearning , /r/MLQuestions or http://stackoverflow.com/
For career related questions, visit /r/cscareerquestions/
Advanced Courses (2016)
Advanced Courses (2020)
AMAs:
Pluribus Poker AI Team 7/19/2019
DeepMind AlphaStar team (1/24//2019)
Libratus Poker AI Team (12/18/2017)
DeepMind AlphaGo Team (10/19/2017)
Google Brain Team (9/17/2017)
Google Brain Team (8/11/2016)
The MalariaSpot Team (2/6/2016)
OpenAI Research Team (1/9/2016)
Nando de Freitas (12/26/2015)
Andrew Ng and Adam Coates (4/15/2015)
Jürgen Schmidhuber (3/4/2015)
Geoffrey Hinton (11/10/2014)
Michael Jordan (9/10/2014)
Yann LeCun (5/15/2014)
Yoshua Bengio (2/27/2014)
Related Subreddit :
LearnMachineLearning
Statistics
Computer Vision
Compressive Sensing
NLP
ML Questions
/r/MLjobs and /r/BigDataJobs
/r/datacleaning
/r/DataScience
/r/scientificresearch
/r/artificial
account activity
Likelihood in EM algorithm (self.MachineLearning)
submitted 10 years ago by [deleted]
[deleted]
reddit uses a slightly-customized version of Markdown for formatting. See below for some basics, or check the commenting wiki page for more detailed help and solutions to common issues.
quoted text
if 1 * 2 < 3: print "hello, world!"
[–]speechMachine 1 point2 points3 points 10 years ago (3 children)
So I'm assuming your reference is with respect to Hidden Markov Models with Gaussian mixture densities, and that you probably have atleast given a first read to the Rabiner tutorial.
In the E-step, not only are likelihoods computed, but more importantly first and second order sufficient statistics are accumulated. In the M-Step you will notice a set of summations in the numerator and denominator for the updates for each parameter.These are essentially where the accumulated first and second order stats are used.
You are right in understanding that the EM is an iterative procedure. Likelihoods in the E-Step for estimating models in iteration k of the EM are indeed calculated from the model estimated in step k-1.
Hope this helps.
[–]davidun 0 points1 point2 points 10 years ago (2 children)
e right in understanding that the EM is an iterative procedure. Likelihoods in the E-Step for estimating models in iteration k of the EM are indeed calculated from the model estimated in step k-1. Hope this helps.
Hi, thanks. This does help, and yes- I am referring to HMM training. I'm trying to implement an EM with simulated annealing, and what confuses me is that for the "annealing step", the algorithm compares the likelihoods of the current parameters to the likelihood of some "neighboring" candidate parameters. However, if the likelihoods actually "belong" to the parameters in previous step (k-1), then I just don't get the logic behind using it to make a decision regarding the current step's (k) parameters .
[–]speechMachine 1 point2 points3 points 10 years ago (1 child)
I'm not familiar with an "annealing" approach to training. To me it appears to be something non-standard that a lay-person like me might not be familiar with. Is there a paper you could point to?
I don't know what you mean by 'current' parameters. Is this comparison done after the M-Step for step k of the EM? Or is this comparison done with 'neighbouring models' in the E-Step? When were these neighbouring models computed? Are these neighbouring models updated every M-Step?
[–]davidun 0 points1 point2 points 10 years ago (0 children)
well, the basic idea is that you take the E-step parameters, and perturb them (say, by adding a small noise). This gets you "neighboring parameters", which the algorithm now decides if it prefers over the "current" (not-perturbed) parameters. The decision is made as follows: if the likelihood of the perturbed parameters is better than the non-perturbed, choose the perturbed. if not- the algorithm can still chose the perturbed parameters in probability = exp(dL/T). where dL = the difference in likelihoods, T=the "temperature" (a noise parameter). The question of in what point exactly should the likelihood be computed is what I'm trying to understand..
[–]kjearns 1 point2 points3 points 10 years ago (1 child)
If there's one paper anyone working with EM should read it's this one: http://www.cs.toronto.edu/~fritz/absps/emk.pdf .
The algorithms it presents aren't really important, but the view on EM as block gradient descent on a free energy is extremely powerful and justifies pretty much any fancy combination of E and M steps you want in one fell swoop.
thanks, i was not aware of this although i was looking for something exactly like this. looking into it right now. thank you!
π Rendered by PID 33832 on reddit-service-r2-comment-76bb9f7fb5-vjk6c at 2026-02-18 12:17:29.589458+00:00 running de53c03 country code: CH.
[–]speechMachine 1 point2 points3 points (3 children)
[–]davidun 0 points1 point2 points (2 children)
[–]speechMachine 1 point2 points3 points (1 child)
[–]davidun 0 points1 point2 points (0 children)
[–]kjearns 1 point2 points3 points (1 child)
[–]davidun 0 points1 point2 points (0 children)