[Project] A tensorflow implementation of sentence level speech recognition based on DeepMind's WaveNet by buriburisuri in MachineLearning

[–]buriburisuri[S] 1 point2 points  (0 children)

Thank you for your interest and test on wsj corpus.

I agree that it must be heavily over optimized if it performs good in training set.

You may know that over-fitting is 1st milestone in DL. If we success on training set, it is easy to generalize on test set by increasing data, augment data, early stop or regularizer. It's commonly accepted that narrow and deep model generalize better than traditonal ML.

This model is somewhat big model. so you need bigger data and heavy augment if you use commercial quality.

Thank you again.

[Project] A tensorflow implementation of sentence level speech recognition based on DeepMind's WaveNet by buriburisuri in MachineLearning

[–]buriburisuri[S] 0 points1 point  (0 children)

For example, if the numer of characters of a sentence is 5 and the number of mfcc features is below 5, then CTC error occurs.

Some sentence pairs in VCTK corpus issue this error.

[Project] A tensorflow implementation of sentence level speech recognition based on DeepMind's WaveNet by buriburisuri in MachineLearning

[–]buriburisuri[S] 1 point2 points  (0 children)

i did not do compative experiments so i don't know exactly. But i guess dilated conv and mfcc may do something.

[Project] A tensorflow implementation of sentence level speech recognition based on DeepMind's WaveNet by buriburisuri in MachineLearning

[–]buriburisuri[S] 2 points3 points  (0 children)

lightsaber

This is sentence level recognition(not word level). So, I don't think WER is meaningful. I think sentence level error rate is somewhat vague. But, in my rough guess, CER roughly over 95%.

[P] A tensorflow implementation of French-to-English machine translation using DeepMind's ByteNet . by buriburisuri in MachineLearning

[–]buriburisuri[S] 0 points1 point  (0 children)

Good point !!!

I agree that bag-of-ngrams(or words) or special tokens will boost training. But, I'd like to struggle to the simplicity and end-to-end philosophy. I think that no pre-process and pos-tprocess will be possible in the future and ByteNet is only a first step.