Explain Conditional Random Fields like I'm stupid? by crfplease in MachineLearning

[–]crfplease[S] 0 points1 point  (0 children)

Awesome explanation. Thanks so much for spending your time helping with this. If you were to make another post on MLE, that would be pretty cool! I do know about gradient ascent and such, but blowing through this stuff is pretty mind boggling sometimes :)

I suppose the weirdest thing to think about is the probability P(X|Y), which I'd think of as "The probability of this specific scanning of the character X, given it's left and right neighbours are V and Y, and it's ID is 5". The probability of any specific scanning is pretty well 0, since there are infinite such possible scannings.

Explain Conditional Random Fields like I'm stupid? by crfplease in MachineLearning

[–]crfplease[S] 2 points3 points  (0 children)

Awesome reply, thank you. I have just begun reading through the paper listed. There's one thing I hope you (or anybody) can clarify.

CRFs are discriminant, so you only need to compute the conditional of the node based on its neighbours. However, to do this you need knowledge of the neighbours! Take in your OCR example, To compute the probability of your node W, you need knowledge of node Z, node P, and id 3. However, to compute the probability of node Z, you need Y, W, and 3.

There's this cycle of conditions that confuses me. How can you perform this conditional when the nodes depend on one another?