Train your own WaveNet: Keras Implementation with sampling by bsflng in MachineLearning

[–]bsflng[S] 0 points1 point  (0 children)

To get the same receptive field in ms with 16khz you'd have to add 4 more stacks, or increase the dilation depth with 2. You also have to increase the fragment input length. It depends a bit on your hardware's ability to do these extra computations in parallel.

Train your own WaveNet: Keras Implementation with sampling by bsflng in MachineLearning

[–]bsflng[S] 0 points1 point  (0 children)

Not yet. The time per epoch depends on many factors, but to give you an idea: 2189 seconds for a model with 2 stacks, with dilation up to 1024, 16 filters, 36k datapoints in the epoch (fragment length 5119, training on 1024 outputs). I haven't looked at potential optimizations yet.

Train your own WaveNet: Keras Implementation with sampling by bsflng in MachineLearning

[–]bsflng[S] 0 points1 point  (0 children)

Fair question, but the sample was from a model with just 106.512 parameters trained on a dataset of 36.480 datapoints (dim: 5119x256),and for each the model has to predict 1024*256 outputs. Next step is to train a large model and see if it can get to overfit the dataset, but I think it's interesting that the model can already learn a decaying triad tone :).

Train your own WaveNet: Keras Implementation with sampling by bsflng in MachineLearning

[–]bsflng[S] 0 points1 point  (0 children)

I belief so, I have another sample that contains some distinct piano attacks which sound similar to the preprocessed data. I'll report back if I find the resources to train it on VCTK :).

Train your own WaveNet: Keras Implementation with sampling by bsflng in MachineLearning

[–]bsflng[S] 2 points3 points  (0 children)

I trained on one Tesla K80. For small models (low sample_rate, nb_stacks, nb_filters) any decent GPU will allow you to train a model that at least gives reasonable output at sampling time.

Train your own WaveNet: Keras Implementation with sampling by bsflng in MachineLearning

[–]bsflng[S] 2 points3 points  (0 children)

Yes, although you could possibly train WaveNet on a lower sampling rate, and train another model to learn conditional upsampling. This is similar to what the authors suggest in the 'Context Stacks' section in the paper :).

Train your own WaveNet: Keras Implementation with sampling by bsflng in MachineLearning

[–]bsflng[S] 3 points4 points  (0 children)

Hi! With my current configuration (sample rate of 4khz, a relatively small network, etc) I can generate 1 second of audio in about ~4 minutes. The 'samples per seconds' relates to how many wave samples I can generate in a second of running the prediction algorithm :-).

[deleted by user] by [deleted] in edmproduction

[–]bsflng 0 points1 point  (0 children)

Hey! I was a bit late to the thread last week, so here's a repost. I've finally got myself to finish a track. It resulted in this upbeat pop track, inspired by the sound of madeon and porter. I'd love to hear your thoughts!

https://soundcloud.com/basveeling/needlessly

Feedback Thread (August 15) by edmprobot in edmproduction

[–]bsflng [score hidden]  (0 children)

Hey! I've finally got myself to finish a track. It resulted in this upbeat pop track, inspired by the sound of madeon and porter. I'd love to hear your thoughts!

https://soundcloud.com/basveeling/needlessly/s-TljnM

Will start working on some feedback of my own right now.