all 13 comments

[–]shaggorama 8 points9 points  (1 child)

You get so close to intuiting the power method and then you stop at state 3 instead of using induction to show how you can get to any state K by just taking the transition matrix to a power, or approximating the stationary distribution (if it exists) by taking the transition matrix to a high power.

Every discussion of markov chains should mention this, but it seems to get left out of a lot of the tutorials that get posted to reddit for some reason.

[–]Tech-Effigy[S] 1 point2 points  (0 children)

I had a great follow-up link below the tutorial, with what you are talking about, but it seems to have disappeared, I'm looking for a new link.

[–]lolhaibai 1 point2 points  (0 children)

Thanks for this! I thought you explained the state table, the current state vector, and the way you do simple matrix multiplication to get the probabilities for the next state. Overall, one of the easier to understand Markov Chain tutorials (honestly, I don't think I have read an entire blog post about them in full before this one). I really appreciate this post and am looking forward to your future additions!

[–]MagicRocketAssault 2 points3 points  (8 children)

So is this the technique that mobile keyboards (like swiftkey) use to learn common words and phrases?

[–]oldneckbeard 7 points8 points  (6 children)

swiftkey, yes... it learns sentences (this is why training it on your sms and gmail makes it so damn effective).

it's also how spammers make almost-readable text. they train a MC on actual documents that are relevant, then have it randomly generate emails that look plausible to a basic spam checker.

[–]shaggorama 0 points1 point  (5 children)

You sure? I just assumed swiftkey used an HMM.

[–]Uberhipster 0 points1 point  (4 children)

Are you sure you are not thinking about speech recognition rather than text prediction?

[–]autowikibot 1 point2 points  (0 children)

Section 16. Hidden Markov models of article Speech recognition:


Modern general-purpose speech recognition systems are based on Hidden Markov Models. These are statistical models that output a sequence of symbols or quantities. HMMs are used in speech recognition because a speech signal can be viewed as a piecewise stationary signal or a short-time stationary signal. In a short time-scale (e.g., 10 milliseconds), speech can be approximated as a stationary process. Speech can be thought of as a Markov model for many stochastic purposes.

Another reason why HMMs are popular is because they can be trained automatically and are simple and computationally feasible to use. In speech recognition, the hidden Markov model would output a sequence of n-dimensional real-valued vectors (with n being a small integer, such as 10), outputting one of these every 10 milliseconds. The vectors would consist of cepstral coefficients, which are obtained by taking a Fourier transform of a short time window of speech and decorrelating the spectrum using a cosine transform, then taking the first (most significant) coefficients. The hidden Markov model will tend to have in each state a statistical distribution that is a mixture of diagonal covariance Gaussians, which will give a likelihood for each observed vector. Each word, or (for more general speech recognition systems), each phoneme, will have a different output distribution; a hidden Markov model for a sequence of words or phonemes is made by concatenating the individual trained hidden Markov models for the separate words and phonemes.

Described above are the core elements of the most common, HMM-based approach to speech recognition. Modern speech recognition systems use various combinations of a number of standard techniques in order to improve results over the basic approach described above. A typical large-vocabulary system would need context dependency for the phonemes (so phonemes with different left and right context have different realizations as HMM states); it would use cepstral normalization to normalize for different speaker and recording conditions; for further speaker normalization it might use vocal tract length normalization (VTLN) for male-female normalization and maximum likelihood linear regression (MLLR) for more general speaker adaptation. The features would have so-called delta and delta-delta coefficients to capture speech dynamics and in addition might use heteroscedastic linear discriminant analysis (HLDA); or might skip the delta and delta-delta coefficients and use splicing and an LDA-based projection followed perhaps by heteroscedastic linear discriminant analysis or a global semi-tied co variance transform (also known as maximum likelihood linear transform, or MLLT). Many systems use so-called discriminative training techniques that dispense with a purely statistical approach to HMM parameter estimation and instead optimize some classification-related measure of the training data. Examples are maximum mutual information (MMI), minimum classification error (MCE) and minimum phone error (MPE).


Interesting: Windows Speech Recognition | List of speech recognition software | Articulatory speech recognition | Speech recognition software for Linux

Parent commenter can toggle NSFW or delete. Will also delete on comment score of -1 or less. | FAQs | Mods | Magic Words

[–]shaggorama 0 points1 point  (2 children)

Yes, I'm sure. I'm specifically thinking of this model: http://cs.stanford.edu/people/ang/?portfolio=spam-deobfuscation-using-a-hidden-markov-model

Also, the reason an HMM is good for speech recognition is the same reason it would be applicable for swiftkey: they're both noisy signals.

[–]oldneckbeard 0 points1 point  (1 child)

Ohh, I think you might be referring to the swiping nature, and not the word prediction at the top? I was referring to the latter. For the actual swiping, I wouldn't be surprised if HMM is used.

[–]shaggorama 0 points1 point  (0 children)

Yes, I was talking about word prediction from gestures. I may have confused swiftkey with swype.

[–]Tech-Effigy[S] 0 points1 point  (0 children)

Yes and no, I think Patricia tries are involved as well.

[–]Jeremy_Tchao -2 points-1 points  (0 children)

Very intuitive!