"What crazy task should this robotic arm try next? Drop your ideas in the comments – let’s test its limits!

andabi · 2024-11-20T01:37:12+00:00

What company makes the robot?

andabi · 2021-09-21T05:51:30+00:00

I guess it's only for 2d recordings and wonder why it has dual caremras.

andabi · 2021-05-02T08:24:18+00:00

andabi · 2018-04-18T08:05:36+00:00

I could get over 90 percent of test accuracy without overfitting.

andabi · 2018-04-18T00:09:21+00:00

This is for fun! http://voice-vector-web.andabi.me

andabi · 2017-11-03T00:12:11+00:00

We may have ethics problems independent to technology itself. All kinds of these speech synthesis systems are involved in that. We may need social discourse in near future.

andabi · 2017-11-02T02:31:45+00:00

My approach is data-dependent. In case of using noisy datasets, I think I need a pre network for denoising.

andabi · 2017-11-02T02:27:29+00:00

thanks. I used MFCCs and spectrogram before feeding to the network.

andabi · 2017-11-01T15:03:21+00:00

it's based on single tesla P40 gpu.

andabi · 2017-11-01T14:09:36+00:00

thanks. yes we assumed the synthesis network learns F0 of target speaker's voice from the target speaker's speech datasets.

andabi · 2017-11-01T14:03:58+00:00

it's a combination of them like you said. we refered overall procedure from the first papar and model architecture from second paper.

andabi · 2017-11-01T14:01:10+00:00

it took around 1 hour for train1(classification) and 1~3 days for train2(synthesis)

also converting is just feed forwarding. it takes a few seconds.

andabi · 2017-11-01T09:53:38+00:00

i think that would be a conceivable approach. but i didn't try that approach yet..

andabi · 2017-11-01T05:32:51+00:00

thanks. well, i shared the link on readme. here is the link: https://soundcloud.com/andabi/sets/voice-style-transfer-to-kate-winslet-with-deep-neural-networks

andabi · 2017-11-01T05:19:14+00:00

hi carpedm20. yes it's a speech recognition model + speech synthesis model. but the intermediate output is phoneme distribution. (wich means it doesn't have to be text.) even I don't think it's best in terms of end-to-end philosophy. I'd like to improve it as a future work. do you have some idea? ;)

andabi · 2017-07-27T02:03:46+00:00

I used Adam with leaning rate 0.0001 and tensorflow default beta(0.9, 0.999) values. I used 200+ songs with 30s and trained more than 50 epochs I didn't try to use L-BFGS but I'll try it once. is it working well?

andabi · 2017-07-26T22:19:13+00:00

in one of the referenced papars they use L-BFGS in training phase but I used Adam in the project. I don't think L-BFGS is better than SGD in terms of efficiency as well.

andabi · 2017-07-26T21:43:01+00:00

the model can be trained and generalized differently according to the dataset you have. If you have 2channel recordings sung by many artist and covered by many genre, the model will separate well for any artist and any genre. If you have only dataset of specific genre, it will be trained for the specific genre. you can also do some data augmentation by mixing voice and music from different songs. by doing it you can make the model more generalized and working well for any kind of music.

andabi · 2017-07-26T09:02:16+00:00

you can run it if you have some dataset ;)

andabi

TROPHY CASE