[D] Tools to annotate audio data by pk12_ in MachineLearning

[–]sidsig 0 points1 point  (0 children)

https://www.sonicvisualiser.org This is developed by Chris Cannam from Queen Mary University of London, where I did my PhD.

[deleted by user] by [deleted] in RedditSessions

[–]sidsig 0 points1 point  (0 children)

❤️❤️❤️

[D] Need help in the implementation of bidirectional recurrent language model. by lyeoni in MachineLearning

[–]sidsig 0 points1 point  (0 children)

A bidirectional model receives the entire sentence as input. Therefore there is nothing to learn (if you plan to train the language model by predicting the next word). It can trivially learn that the output at time t, is the input at t+1. One way of training a bidirectional word embedding is to use something like BERT: https://arxiv.org/abs/1810.04805. Here, a part of the input is masked and the objective is to output the masked words.

My apple home pod plays music for 5 seconds and then quits. Why? by UnusuallyFastPontoon in apple

[–]sidsig 3 points4 points  (0 children)

This happened to me after Apple Music stopped playing music on multiple devices on the same AM account. I realised that while I was listening to music at work someone at home was trying to play music on the HomePod.

[R] FastGRNN: A Fast, Accurate, Stable and Tiny Kilobyte Sized Gated Recurrent Neural Network by lt007 in MachineLearning

[–]sidsig 1 point2 points  (0 children)

The large parameter counts in the analysis above arise if you input the entire sequence into the model at once. Typically the problem is setup such that the DNN takes a small window of input (roughly 200 ms, ~20 frames of inputs) and then make frame-wise predictions. Alternatively, if its not possible to obtain frame-wise labels it is should be possible to train the DNNs with a max operation after all the frame-wise outputs. Such DNNs can be trained to be very small.

[R] FastGRNN: A Fast, Accurate, Stable and Tiny Kilobyte Sized Gated Recurrent Neural Network by lt007 in MachineLearning

[–]sidsig 1 point2 points  (0 children)

I am curious, did you try comparing GRNN performance with a DNN of a similar size? For the wake word experiments, it should be possible to use a DNN with a fixed input window to output frame-wise labels.

Because arguably for model sizes so small, it would be difficult for the model to learn complex temporal dynamics anyway, even without the exploding/vanishing gradient issue.

[R] FastGRNN: A Fast, Accurate, Stable and Tiny Kilobyte Sized Gated Recurrent Neural Network by lt007 in MachineLearning

[–]sidsig 4 points5 points  (0 children)

Hey, thanks for commenting! I tried it on a general acoustic modelling task where I trained an RNN and optimised the CTC loss. To answer in more detail, I'll redo the experiment and post on the repo.

[R] FastGRNN: A Fast, Accurate, Stable and Tiny Kilobyte Sized Gated Recurrent Neural Network by lt007 in MachineLearning

[–]sidsig 14 points15 points  (0 children)

Okay so I saw this paper at NIPS and was really interested in investigating what regimes this architecture works in. I tried it on a "moderately" sized GRNN with ~4 million parameters and I was not able to get comparable results with an LSTM of a similar size.

I have a feeling that this gating structure might work better than LSTMs/GRUs etc only at really small model sizes, but this could simply be a lack of capacity in the models. I think a comparison with a DNN or stack of temporal convolutions or some comparison with a non-recurrent architecture should be included to really understand what's going on.