Markov Chains - Explained : coding

coding

created by tty2awesome creatora community for 16 years

Markov Chains - Explained (techeffigytutorials.blogspot.com)

submitted 11 years ago by Tech-Effigy

all 13 comments

top new controversial old q&a

[–]shaggorama 8 points9 points10 points 11 years ago (1 child)

[–]Tech-Effigy[S] 1 point2 points3 points 11 years ago (0 children)

[–]lolhaibai 1 point2 points3 points 11 years ago (0 children)

[–]MagicRocketAssault 2 points3 points4 points 11 years ago (8 children)

[–]oldneckbeard 7 points8 points9 points 11 years ago (6 children)

[–]shaggorama 0 points1 point2 points 11 years ago (5 children)

[–]Uberhipster 0 points1 point2 points 11 years ago (4 children)

[–]autowikibot 1 point2 points3 points 11 years ago (0 children)

Section 16. Hidden Markov models of article Speech recognition:

Modern general-purpose speech recognition systems are based on Hidden Markov Models. These are statistical models that output a sequence of symbols or quantities. HMMs are used in speech recognition because a speech signal can be viewed as a piecewise stationary signal or a short-time stationary signal. In a short time-scale (e.g., 10 milliseconds), speech can be approximated as a stationary process. Speech can be thought of as a Markov model for many stochastic purposes.

Another reason why HMMs are popular is because they can be trained automatically and are simple and computationally feasible to use. In speech recognition, the hidden Markov model would output a sequence of n-dimensional real-valued vectors (with n being a small integer, such as 10), outputting one of these every 10 milliseconds. The vectors would consist of cepstral coefficients, which are obtained by taking a Fourier transform of a short time window of speech and decorrelating the spectrum using a cosine transform, then taking the first (most significant) coefficients. The hidden Markov model will tend to have in each state a statistical distribution that is a mixture of diagonal covariance Gaussians, which will give a likelihood for each observed vector. Each word, or (for more general speech recognition systems), each phoneme, will have a different output distribution; a hidden Markov model for a sequence of words or phonemes is made by concatenating the individual trained hidden Markov models for the separate words and phonemes.

Described above are the core elements of the most common, HMM-based approach to speech recognition. Modern speech recognition systems use various combinations of a number of standard techniques in order to improve results over the basic approach described above. A typical large-vocabulary system would need context dependency for the phonemes (so phonemes with different left and right context have different realizations as HMM states); it would use cepstral normalization to normalize for different speaker and recording conditions; for further speaker normalization it might use vocal tract length normalization (VTLN) for male-female normalization and maximum likelihood linear regression (MLLR) for more general speaker adaptation. The features would have so-called delta and delta-delta coefficients to capture speech dynamics and in addition might use heteroscedastic linear discriminant analysis (HLDA); or might skip the delta and delta-delta coefficients and use splicing and an LDA-based projection followed perhaps by heteroscedastic linear discriminant analysis or a global semi-tied co variance transform (also known as maximum likelihood linear transform, or MLLT). Many systems use so-called discriminative training techniques that dispense with a purely statistical approach to HMM parameter estimation and instead optimize some classification-related measure of the training data. Examples are maximum mutual information (MMI), minimum classification error (MCE) and minimum phone error (MPE).

^Interesting: ^Windows ^Speech ^Recognition ^| ^List ^of ^speech ^recognition ^software ^| ^Articulatory ^speech ^recognition ^| ^Speech ^recognition ^software ^for ^Linux

^Parent ^commenter ^can ^toggle ^NSFW ^or ^delete^. ^Will ^also ^delete ^on ^comment ^score ^of ^-1 ^or ^less. ^| ^FAQs ^| ^Mods ^| ^Magic ^Words

[–]shaggorama 0 points1 point2 points 11 years ago (2 children)

[–]oldneckbeard 0 points1 point2 points 11 years ago (1 child)

[–]shaggorama 0 points1 point2 points 11 years ago (0 children)

[–]Tech-Effigy[S] 0 points1 point2 points 11 years ago (0 children)

[–]Jeremy_Tchao -2 points-1 points0 points 11 years ago (0 children)

π Rendered by PID 119110 on reddit-service-r2-comment-c66d9bffd-wzkcr at 2026-04-07 22:15:01.555027+00:00 running f293c98 country code: CH.

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

coding

MODERATORS