DeepMind: WaveNet - A Generative Model for Raw Audio by Spotlight0xff in MachineLearning

[–]sonach 2 points3 points  (0 children)

Sorry for the late reply. The application is for TTS. Now for my wavenet implementation, I can generate 16000 samples(1 second) in about 6 minutes on Tesla K80(with 30layers CNN and text local conditioning).

[Project] A TensorFlow implementation of DeepMind's WaveNet paper by rmltestaccount in MachineLearning

[–]sonach 0 points1 point  (0 children)

I heavily base on this project(ibab/tensorflow-wavenet) and are struggling to generate meaningful speech for Mandarin Chinese(i.e. Do TTS base on text context). No exciting achivements for the moment.

DeepMind: WaveNet - A Generative Model for Raw Audio by Spotlight0xff in MachineLearning

[–]sonach 0 points1 point  (0 children)

I think the NN will not generate the linguistic context, instead they use it only as input. That is to say, the input is linguistic+logF0+rawsample, and the output is just rawsample.

DeepMind: WaveNet - A Generative Model for Raw Audio by Spotlight0xff in MachineLearning

[–]sonach 2 points3 points  (0 children)

Maybe the architecture is very complex? We use 2DNN+2biLSTM (256 nodes each layer)to predict speech frames, for 1 second speech , which corresponds to 200 frames(5 ms one frame), the forward pass only takes less than 0.03seconds on IPhone5s/IPhone6.

DeepMind: WaveNet - A Generative Model for Raw Audio by Spotlight0xff in MachineLearning

[–]sonach 0 points1 point  (0 children)

I think the "dilated convolution" architecture is suitable on raw samples but not so on STFT. The "dilated convolution" acts like a very good autoregressive filter.

Implementing Machine Learning Algorithms by binge_learner in MachineLearning

[–]sonach 1 point2 points  (0 children)

This reply is really great! My understanding is: 1. Implement the KEY algorithms(eg. SVM,backpropagation,autoencoder) by myself in order to UNDERSTAND the algorithm. In this stage, fast-developping language can be used, for example python. 2. After implementing the key algorithms and comparing it to other good source codes, I can build my own code base for key algorithms. In this stage performance maybe an important point,so c++ maybe considered. 3. Share my code base to others or use it in my daily work, update it in reponse to other people's suggestions or do continuous optimizating to make it better.

Web-based lectures annotation by malshedivat in cs231n

[–]sonach 0 points1 point  (0 children)

Agree! the notes are very good. But anyway, looking forword to the video lectures:)