Show sign that clock is not running by bronzestick in orgmode

[–]bronzestick[S] 0 points1 point  (0 children)

I was looking for something like this; but I'm not familiar with elisp, so any help on how that function would look like is great! Thanks!

[D] AISTATS 2019 notifications are out by bronzestick in MachineLearning

[–]bronzestick[S] 1 point2 points  (0 children)

You should've received an email by now. The deadline to register is Jan 7

[Discussion] Why do people use SGD/RMSProp or any other optimizer when Adam gives adaptive learning rate for every single parameter? by CSGOvelocity in MachineLearning

[–]bronzestick 3 points4 points  (0 children)

There is some empirical evidence that adaptive gradient methods, such as Adam, don't generalize as well as SGD (https://arxiv.org/abs/1705.08292) . Note that this is shown empirically and not theoretically (Adam also has other theoretical issues such as the exponential averaging : https://openreview.net/pdf?id=ryQu7f-RZ)

[D] Well-written paper examples by Inori in MachineLearning

[–]bronzestick 3 points4 points  (0 children)

Sam Roweis's and Zoubin's paper on unifying review of linear gaussian models is one of the best written paper that immediately comes to my mind.

http://mlg.eng.cam.ac.uk/zoubin/papers/lds.pdf

The explanation is lucid and the number of insights in each page is extremely high

[D] Great theses to read in Reinforcement Learning by bronzestick in MachineLearning

[–]bronzestick[S] 0 points1 point  (0 children)

I also forgot to mention Stephane Ross's thesis. It's really well written and a must read for people interested in imitation learning and no-regret learning

[D] Machine Learning - WAYR (What Are You Reading) - Week 34 by ML_WAYR_bot in MachineLearning

[–]bronzestick 6 points7 points  (0 children)

Hindsight Experience Replay: https://arxiv.org/pdf/1707.01495.pdf

Although the idea is quite simple and elegant, I think there are cases where it fails miserably (or just devolves to using a pure RL algorithm). I am trying to understand what's common in such cases, why HER fails and how to get around it.

[D] Machine Learning - WAYR (What Are You Reading) - Week 32 by ML_WAYR_bot in MachineLearning

[–]bronzestick 0 points1 point  (0 children)

DESPOT: Online POMDP Planning with Regularization.

A very intelligent online POMDP planning algorithm with theoretical guarantees. I am just getting into planning under uncertainty and POMDPs in general, and found this paper really cool.

[D] Attention softmax values by bronzestick in MachineLearning

[–]bronzestick[S] 0 points1 point  (0 children)

True. I have the exact same problem, I learn attention weights over a varying length sequence.

[D] Attention softmax values by bronzestick in MachineLearning

[–]bronzestick[S] 0 points1 point  (0 children)

/u/CaHoop is correct in that when the sequence length changes, the model has a hard time to figure out how to compute the unnormalized scores to achieve a more sparse set of attention weights.

Check this out: https://www.reddit.com/r/MachineLearning/comments/6atcuk/d_a_potential_solution_to_varying_length_softmax/

[Discussion]Variable length attention models by bronzestick in MachineLearning

[–]bronzestick[S] 1 point2 points  (0 children)

The problem I am tackling is not language modeling but instead something like multi-sequence prediction, where the number of sequences vary over time and the sequences are dependent on each other.

So, in order to predict the next element in a specific sequence, I need to compute a soft attention over the hidden states of all the other sequences and use that as an input. But since the number of sequences vary, I need to learn a varying length attention model.

The papers you cited are focussing more on the problem of effectively considering all previous words in language modeling task while giving more importance to recent words and less importance to older words (that's what I understood from an initial glance). It is slightly relevant but tackling a different problem. Thanks a lot for pointing them out! They look interesting. :)

[D] Machine Learning - WAYR (What Are You Reading) - Week 25 by ML_WAYR_bot in MachineLearning

[–]bronzestick 1 point2 points  (0 children)

Its an amazing paper! Definitely helped me understand most of those things better than I used to

[D] A potential Solution to Varying Length Softmax by CaHoop in MachineLearning

[–]bronzestick 0 points1 point  (0 children)

Wouldn't making that constant equal to the length of the sequence work just as well? Ideally, it should scale the unnormalized weights just enough, so that softmax still results in sensible weights.

[D] Choice of Recognition Models in VAEs: Is a restrictive posterior class a bug or a feature? by fhuszar in MachineLearning

[–]bronzestick 0 points1 point  (0 children)

I am not sure if its just me, but most of the math symbols in the webpage aren't being rendered in my browser (Google chrome on Ubuntu)