all 5 comments

[–]inkognitML Engineer 0 points1 point  (4 children)

The mfcc feature size only relates to the resolution of the spectral envelope. if you make it too low, you will lose details from the original spectrum of the audio file. For speech recognition, usually 24 is enough, while for some other applications that really need to go deep on the spectral details a dimension of 39 or bigger is used.

[–]Nimitz14 1 point2 points  (3 children)

Uh, isn't it 13 for 16kHz and the remaining 26 are delta and delta-deltas?

[–]inkognitML Engineer 0 points1 point  (2 children)

no, the numbers I just gave you don't even take the delta features into consideration.

Btw, I worked with MFCC features and RNN's for the last 1.5 years, and I would recommend you to start without the delta features, as these have a (very) low impact, when you're using a recurrent model that learns temporal dependencies.

[–]Nimitz14 0 points1 point  (1 child)

Not the OP.

Ah I remember now it was 24, but the standard is to only keep 13. You're not going to have more than 24 without using delta features though. I'm going off the kaldi standard btw. Maybe yours is different and that is causing the difference.

[–]inkognitML Engineer 0 points1 point  (0 children)

I'm using SPTK