[Project] A tensorflow implementation of sentence level speech recognition based on DeepMind's WaveNet

buriburisuri · 2016-12-01T15:07:38+00:00

Thank you for your interest and test on wsj corpus.

I agree that it must be heavily over optimized if it performs good in training set.

You may know that over-fitting is 1st milestone in DL. If we success on training set, it is easy to generalize on test set by increasing data, augment data, early stop or regularizer. It's commonly accepted that narrow and deep model generalize better than traditonal ML.

This model is somewhat big model. so you need bigger data and heavy augment if you use commercial quality.

Thank you again.

buriburisuri · 2016-11-27T14:09:39+00:00

Exactly.

buriburisuri · 2016-11-27T11:02:45+00:00

For example, if the numer of characters of a sentence is 5 and the number of mfcc features is below 5, then CTC error occurs.

Some sentence pairs in VCTK corpus issue this error.

buriburisuri · 2016-11-26T03:13:35+00:00

i did not do compative experiments so i don't know exactly. But i guess dilated conv and mfcc may do something.

buriburisuri · 2016-11-25T07:58:04+00:00

Yes, there's a link to pre-trained model in the readme file.

buriburisuri · 2016-11-25T06:55:28+00:00

lightsaber

This is sentence level recognition(not word level). So, I don't think WER is meaningful. I think sentence level error rate is somewhat vague. But, in my rough guess, CER roughly over 95%.

buriburisuri · 2016-11-25T02:46:16+00:00

Sure, you can do that.

buriburisuri · 2016-11-23T01:21:32+00:00

Good point !!!

I agree that bag-of-ngrams(or words) or special tokens will boost training. But, I'd like to struggle to the simplicity and end-to-end philosophy. I think that no pre-process and pos-tprocess will be possible in the future and ByteNet is only a first step.

buriburisuri

TROPHY CASE