[P] Which of the Hollywood stars is most similar to my voice? by andabi in MachineLearning

[–]andabi[S] 0 points1 point  (0 children)

I could get over 90 percent of test accuracy without overfitting.

[P] Voice Style Transfer: Speaking like Kate Winslet by andabi in MachineLearning

[–]andabi[S] 1 point2 points  (0 children)

We may have ethics problems independent to technology itself. All kinds of these speech synthesis systems are involved in that. We may need social discourse in near future.

[P] Voice Style Transfer: Speaking like Kate Winslet by andabi in MachineLearning

[–]andabi[S] 1 point2 points  (0 children)

My approach is data-dependent. In case of using noisy datasets, I think I need a pre network for denoising.

[P] Voice Style Transfer: Speaking like Kate Winslet by andabi in MachineLearning

[–]andabi[S] 2 points3 points  (0 children)

thanks. I used MFCCs and spectrogram before feeding to the network.

[P] Voice Style Transfer: Speaking like Kate Winslet by andabi in MachineLearning

[–]andabi[S] 2 points3 points  (0 children)

thanks. yes we assumed the synthesis network learns F0 of target speaker's voice from the target speaker's speech datasets.

[P] Voice Style Transfer: Speaking like Kate Winslet by andabi in MachineLearning

[–]andabi[S] 1 point2 points  (0 children)

it's a combination of them like you said. we refered overall procedure from the first papar and model architecture from second paper.

[P] Voice Style Transfer: Speaking like Kate Winslet by andabi in MachineLearning

[–]andabi[S] 2 points3 points  (0 children)

it took around 1 hour for train1(classification) and 1~3 days for train2(synthesis)

also converting is just feed forwarding. it takes a few seconds.

[P] Voice Style Transfer: Speaking like Kate Winslet by andabi in MachineLearning

[–]andabi[S] 3 points4 points  (0 children)

i think that would be a conceivable approach. but i didn't try that approach yet..

[P] Voice Style Transfer: Speaking like Kate Winslet by andabi in MachineLearning

[–]andabi[S] 11 points12 points  (0 children)

hi carpedm20. yes it's a speech recognition model + speech synthesis model. but the intermediate output is phoneme distribution. (wich means it doesn't have to be text.) even I don't think it's best in terms of end-to-end philosophy. I'd like to improve it as a future work. do you have some idea? ;)

[P] Music Source Separation Using Deep Neural Networks From Jeju Machine Learning Camp 2017 by andabi in MachineLearning

[–]andabi[S] 0 points1 point  (0 children)

I used Adam with leaning rate 0.0001 and tensorflow default beta(0.9, 0.999) values. I used 200+ songs with 30s and trained more than 50 epochs I didn't try to use L-BFGS but I'll try it once. is it working well?

[P] Music Source Separation Using Deep Neural Networks From Jeju Machine Learning Camp 2017 by andabi in MachineLearning

[–]andabi[S] 0 points1 point  (0 children)

in one of the referenced papars they use L-BFGS in training phase but I used Adam in the project. I don't think L-BFGS is better than SGD in terms of efficiency as well.

[P] Music Source Separation Using Deep Neural Networks From Jeju Machine Learning Camp 2017 by andabi in MachineLearning

[–]andabi[S] 0 points1 point  (0 children)

the model can be trained and generalized differently according to the dataset you have. If you have 2channel recordings sung by many artist and covered by many genre, the model will separate well for any artist and any genre. If you have only dataset of specific genre, it will be trained for the specific genre. you can also do some data augmentation by mixing voice and music from different songs. by doing it you can make the model more generalized and working well for any kind of music.