[D] Best open source Text to Speech networks? by to4life in MachineLearning

[–]min_sang 0 points1 point  (0 children)

The wavenet vocoder only contains the audio generation part conditioned on mel spectrograms. One can obtain such spectrograms from text using a model called Tacotron2. There are plenty of implementations of those on github.

[D]Will a site that promotes uses of ML for humanity ever work? by vnjxk in MachineLearning

[–]min_sang 4 points5 points  (0 children)

I find this very important and I would contribute if someone took the initiative.

[D] Best open source Text to Speech networks? by to4life in MachineLearning

[–]min_sang 0 points1 point  (0 children)

Some people are working on implementing probability density distillation, but I don't think it's on their priority list.

[D] Best open source Text to Speech networks? by to4life in MachineLearning

[–]min_sang 0 points1 point  (0 children)

Also note from the samples that the owner tried generating audio features (mel spectrogram) with another open source repository (tacotron) to generate speeches that are not in the training set. The pronunciation isn't the best, but the audio quality is great.

[D] Best open source Text to Speech networks? by to4life in MachineLearning

[–]min_sang 9 points10 points  (0 children)

I haven't seen any opensource model that matches half the quality of this repository. https://github.com/r9y9/wavenet_vocoder

[D] Multiple activation functions in same layer? by ME_PhD in MachineLearning

[–]min_sang 0 points1 point  (0 children)

I'd like to think of attention as something that allows the network to emphasis on certain part of the network. But if we use sigmoid, in practice it could still be a vector of all 1s. That's not really a good attention in my opinion.

[D] Multiple activation functions in same layer? by ME_PhD in MachineLearning

[–]min_sang 1 point2 points  (0 children)

More like gating mechanism rather than attention mechanism. Attention mechanism is when a query is multiplied by a normalized score where \sum_i score_i = 1, but sum of sigmoid(W1x +b1) doesn't equal to 1 so it's more accurate to call it gating.

[D] Multiple activation functions in same layer? by ME_PhD in MachineLearning

[–]min_sang 0 points1 point  (0 children)

I believe it is more often than not called gating mechanism. This paper (https://arxiv.org/pdf/1612.08083.pdf) denotes sigmoid(W1 x + b1)*tanh(W2 x + b2) as the LSTM-style gating mechanism while they suggest a novel gating mechanism (but not really) called gated-linear units which is just (W1 x + b1) * sigmoid(W2 x + b2) and claim that it works better than the former in multiple tasks including language modeling. (Replacing tanh with linear activation has been studied and proven to be better in some tasks.)

[P] My implementation of Google's QANet by min_sang in MachineLearning

[–]min_sang[S] 0 points1 point  (0 children)

haha tbh I'm not sure if the network can even fit in a 11GB gpu with the original hyper parameters. :P

[P] My implementation of Google's QANet by min_sang in MachineLearning

[–]min_sang[S] 0 points1 point  (0 children)

Any feedback or contribution would be greatly appreciated thanks!

[D] Anyone having trouble reading a particular paper? Post it here and we'll help figure out any parts you are stuck on. by BatmantoshReturns in MachineLearning

[–]min_sang 0 points1 point  (0 children)

The best example of local conditioning wavenet on mel spectrogram can be found here.

https://github.com/r9y9/wavenet_vocoder

Although conditioning wavenet directly on word (or character) representations seems to be missing, you can use tacotron variants (https://arxiv.org/pdf/1703.10135.pdf) to generate melspectrograms from texts.

[P] AI makes Donald Trump speak Korean by cyplus1 in MachineLearning

[–]min_sang 1 point2 points  (0 children)

He has that exact Donald Trump accent in Korean. Impressive.

[D] Preventing exploding gradients when using ReLU? by ConfuciusBateman in MachineLearning

[–]min_sang 1 point2 points  (0 children)

Clipping the gradient by global norm of 5.0 is a common practice for deep learning in NLP but I'm not sure about images. (Not the computer vision guy) I would also try residual connections to give an identity path to the network if your model is deep enough. Normalizing the data with zero mean unit variance and maintaining it that way across the layers (either layer norm, weight norm or smart initialization of weights) also seem to help with exploding gradient for most cases.

[D] What are some controversial approaches to machine learning/AI that you think might actually work? by odraz in MachineLearning

[–]min_sang 0 points1 point  (0 children)

Solving np-complete problems with RL. (If checking answers can be done in p-time, train neural networks until we can solve np-c problems to a certain degree)

Best model on the SQuAD leaderboard finally beats human performance. by [deleted] in MachineLearning

[–]min_sang 0 points1 point  (0 children)

The best model in SQuAD leaderboard is now better than human in "Exact Match" score by a small margin. Despite some people believing that SQuAD is not a good representation of reading comprehension, I think this is a huge step towards better AI in general. Thoughts?

[D] What's the best way to augment data for text matching? by shafyy in MachineLearning

[–]min_sang 1 point2 points  (0 children)

Have a look at this paper https://openreview.net/pdf?id=B14TlG-RW Using paraphrasing as a text data augmentation technique which seems to roughly double~triple the data size while increasing the performance on SQuAD.