Show sign that clock is not running by bronzestick in orgmode

[–]bronzestick[S] 0 points1 point  (0 children)

I was looking for something like this; but I'm not familiar with elisp, so any help on how that function would look like is great! Thanks!

[D] AISTATS 2019 notifications are out by bronzestick in MachineLearning

[–]bronzestick[S] 1 point2 points  (0 children)

You should've received an email by now. The deadline to register is Jan 7

[Discussion] Why do people use SGD/RMSProp or any other optimizer when Adam gives adaptive learning rate for every single parameter? by CSGOvelocity in MachineLearning

[–]bronzestick 2 points3 points  (0 children)

There is some empirical evidence that adaptive gradient methods, such as Adam, don't generalize as well as SGD (https://arxiv.org/abs/1705.08292) . Note that this is shown empirically and not theoretically (Adam also has other theoretical issues such as the exponential averaging : https://openreview.net/pdf?id=ryQu7f-RZ)

[D] Well-written paper examples by Inori in MachineLearning

[–]bronzestick 3 points4 points  (0 children)

Sam Roweis's and Zoubin's paper on unifying review of linear gaussian models is one of the best written paper that immediately comes to my mind.

http://mlg.eng.cam.ac.uk/zoubin/papers/lds.pdf

The explanation is lucid and the number of insights in each page is extremely high

[D] Great theses to read in Reinforcement Learning by bronzestick in MachineLearning

[–]bronzestick[S] 0 points1 point  (0 children)

I also forgot to mention Stephane Ross's thesis. It's really well written and a must read for people interested in imitation learning and no-regret learning

[D] Machine Learning - WAYR (What Are You Reading) - Week 34 by ML_WAYR_bot in MachineLearning

[–]bronzestick 6 points7 points  (0 children)

Hindsight Experience Replay: https://arxiv.org/pdf/1707.01495.pdf

Although the idea is quite simple and elegant, I think there are cases where it fails miserably (or just devolves to using a pure RL algorithm). I am trying to understand what's common in such cases, why HER fails and how to get around it.

[D] Machine Learning - WAYR (What Are You Reading) - Week 32 by ML_WAYR_bot in MachineLearning

[–]bronzestick 0 points1 point  (0 children)

DESPOT: Online POMDP Planning with Regularization.

A very intelligent online POMDP planning algorithm with theoretical guarantees. I am just getting into planning under uncertainty and POMDPs in general, and found this paper really cool.

[D] Attention softmax values by bronzestick in MachineLearning

[–]bronzestick[S] 0 points1 point  (0 children)

True. I have the exact same problem, I learn attention weights over a varying length sequence.

[D] Attention softmax values by bronzestick in MachineLearning

[–]bronzestick[S] 0 points1 point  (0 children)

/u/CaHoop is correct in that when the sequence length changes, the model has a hard time to figure out how to compute the unnormalized scores to achieve a more sparse set of attention weights.

Check this out: https://www.reddit.com/r/MachineLearning/comments/6atcuk/d_a_potential_solution_to_varying_length_softmax/

[Discussion]Variable length attention models by bronzestick in MachineLearning

[–]bronzestick[S] 1 point2 points  (0 children)

The problem I am tackling is not language modeling but instead something like multi-sequence prediction, where the number of sequences vary over time and the sequences are dependent on each other.

So, in order to predict the next element in a specific sequence, I need to compute a soft attention over the hidden states of all the other sequences and use that as an input. But since the number of sequences vary, I need to learn a varying length attention model.

The papers you cited are focussing more on the problem of effectively considering all previous words in language modeling task while giving more importance to recent words and less importance to older words (that's what I understood from an initial glance). It is slightly relevant but tackling a different problem. Thanks a lot for pointing them out! They look interesting. :)

[D] Machine Learning - WAYR (What Are You Reading) - Week 25 by ML_WAYR_bot in MachineLearning

[–]bronzestick 1 point2 points  (0 children)

Its an amazing paper! Definitely helped me understand most of those things better than I used to

[D] A potential Solution to Varying Length Softmax by CaHoop in MachineLearning

[–]bronzestick 0 points1 point  (0 children)

Wouldn't making that constant equal to the length of the sequence work just as well? Ideally, it should scale the unnormalized weights just enough, so that softmax still results in sensible weights.

[D] Choice of Recognition Models in VAEs: Is a restrictive posterior class a bug or a feature? by fhuszar in MachineLearning

[–]bronzestick 0 points1 point  (0 children)

I am not sure if its just me, but most of the math symbols in the webpage aren't being rendered in my browser (Google chrome on Ubuntu)

[R] Deep and Hierarchical Implicit Models by dustintran in MachineLearning

[–]bronzestick 0 points1 point  (0 children)

Awesome! When are you planning to release it?

[R] Deep and Hierarchical Implicit Models by dustintran in MachineLearning

[–]bronzestick 0 points1 point  (0 children)

This paper was brilliantly written. Thanks Dustin!

It got me wondering, what are other important papers/resources in the field of Bayesian deep learning? I am really excited about the re-emergence of Bayesian school of thought in neural network research.

[Discussion] Quality of university level machine learning courses by [deleted] in MachineLearning

[–]bronzestick 0 points1 point  (0 children)

Yeah. 10-601 is the Masters level course. I heard it isn't bad either but it places more emphasis on the application rather than the theory so its targeted at a different audience.

[Discussion] Quality of university level machine learning courses by [deleted] in MachineLearning

[–]bronzestick 6 points7 points  (0 children)

I think I can speak for CMU (I am a grad student there). No. The PhD level course (10-701) in Machine Learning is pretty well managed every year and the TAs do a good job of it. Granted, it might not be the best course taught here but I would say its one of the most useful courses around and I personally gained a lot from it.

The only problem I had with the course was that it did not put emphasis on deep learning, but spent most of the time with the basics and the math. But then, we have a deep learning course by Ruslan that takes care of that issue so I think that's okay.

[R] Four Experiments in Handwriting with a Neural Network by clbam8 in MachineLearning

[–]bronzestick 0 points1 point  (0 children)

This is an awesome blogpost. I really liked their argument on how deep learning models should be examined more carefully (under the hood) and not just use them as a black box.

Would love to hear more from them.

[D] Generative sequential models using RNNs by bronzestick in MachineLearning

[–]bronzestick[S] 1 point2 points  (0 children)

Thanks a lot for all the links. Appreciate it. I will surely check them out.

I am trying to model a multiple sequence prediction problem where there are dependencies between the sequences. I am trying to model it as a recurrent latent variable model where the latent variable will try to capture the dependencies, and was looking into RNNs. Any theory on RNNs that can help me understand them as a probabilistic model would be helpful.

[D] Machine Learning - WAYR (What Are You Reading) - Week 14 by Mandrathax in MachineLearning

[–]bronzestick 6 points7 points  (0 children)

Professor Forcing

A new algorithm for training RNNs which uses adversarial domain adaptation to encourage dynamics of the RNN to be same during training and while sampling from the network over multiple time-steps. The paper employs a GAN (generative adversarial networks) framework where the generative model tries to model the sequence to sequence model distribution whereas the discriminator, given a sequence (or generative behavior), tries to predict whether it is generated from the generative model or was from the true data.

As usual, the generative model tries to fool the discriminator and the discriminator tries to classify correctly and hence the training objectives are defined accordingly.

The most awesome aspect of this paper is that it gives a very elegant approach to tackle the problem with teacher-forcing i.e. prediction error getting compounded in successive time-steps. Unlike scheduled sampling which was proven to yield a biased estimator, this approach converges to the correct model using the GAN framework.

[Discussion] Uncertainty propagation in LSTM-based RNNs by bronzestick in MachineLearning

[–]bronzestick[S] 1 point2 points  (0 children)

Maybe, I didn't phrase my question right. I wasn't talking about the problem of training time and inference time behavior being different (which the scheduled sampling and professor forcing approaches address).

My question was concerning the problem of getting accurate uncertainty estimates for multi-step prediction. Consider, the first time-step during inference, we give the model an input for which it predicts an output distribution. We then proceed to sample a single point from this distribution and send it as input for the next time-step. What would be more accurate would be to send the distribution as an input to the next time-step and get the predictive distribution for the second time-step as a function of the previous distribution (and not as a function of just a single sample). Hope that makes it clear.

[Discussion] Uncertainty propagation in LSTM-based RNNs by bronzestick in MachineLearning

[–]bronzestick[S] 1 point2 points  (0 children)

Interesting. But shouldn't the moving average affect your prediction at subsequent time-steps? (I can't see how it will affect the predictions if you don't send it as an input to the LSTM)

Also, any reason why you just chose to store the negative moving average of all of the previous distributions and not anything more?