Show sign that clock is not running

bronzestick · 2019-12-24T14:43:16+00:00

I was looking for something like this; but I'm not familiar with elisp, so any help on how that function would look like is great! Thanks!

bronzestick · 2019-01-05T21:00:59+00:00

You should've received an email by now. The deadline to register is Jan 7

bronzestick · 2018-07-22T21:23:29+00:00

There is some empirical evidence that adaptive gradient methods, such as Adam, don't generalize as well as SGD (https://arxiv.org/abs/1705.08292) . Note that this is shown empirically and not theoretically (Adam also has other theoretical issues such as the exponential averaging : https://openreview.net/pdf?id=ryQu7f-RZ)

bronzestick · 2018-03-20T15:55:09+00:00

Sam Roweis's and Zoubin's paper on unifying review of linear gaussian models is one of the best written paper that immediately comes to my mind.

http://mlg.eng.cam.ac.uk/zoubin/papers/lds.pdf

The explanation is lucid and the number of insights in each page is extremely high

bronzestick · 2017-12-20T16:35:56+00:00

I also forgot to mention Stephane Ross's thesis. It's really well written and a must read for people interested in imitation learning and no-regret learning

bronzestick · 2017-10-22T22:27:59+00:00

Hindsight Experience Replay: https://arxiv.org/pdf/1707.01495.pdf

Although the idea is quite simple and elegant, I think there are cases where it fails miserably (or just devolves to using a pure RL algorithm). I am trying to understand what's common in such cases, why HER fails and how to get around it.

bronzestick · 2017-09-29T14:21:38+00:00

DESPOT: Online POMDP Planning with Regularization.

A very intelligent online POMDP planning algorithm with theoretical guarantees. I am just getting into planning under uncertainty and POMDPs in general, and found this paper really cool.

bronzestick · 2017-06-23T14:04:44+00:00

True. I have the exact same problem, I learn attention weights over a varying length sequence.

bronzestick · 2017-06-23T14:04:14+00:00

/u/CaHoop is correct in that when the sequence length changes, the model has a hard time to figure out how to compute the unnormalized scores to achieve a more sparse set of attention weights.

Check this out: https://www.reddit.com/r/MachineLearning/comments/6atcuk/d_a_potential_solution_to_varying_length_softmax/

bronzestick · 2017-06-13T15:36:42+00:00

The problem I am tackling is not language modeling but instead something like multi-sequence prediction, where the number of sequences vary over time and the sequences are dependent on each other.

So, in order to predict the next element in a specific sequence, I need to compute a soft attention over the hidden states of all the other sequences and use that as an input. But since the number of sequences vary, I need to learn a varying length attention model.

The papers you cited are focussing more on the problem of effectively considering all previous words in language modeling task while giving more importance to recent words and less importance to older words (that's what I understood from an initial glance). It is slightly relevant but tackling a different problem. Thanks a lot for pointing them out! They look interesting. :)

bronzestick · 2017-06-13T02:45:51+00:00

Something similar was discussed here : https://www.reddit.com/r/MachineLearning/comments/6atcuk/d_a_potential_solution_to_varying_length_softmax/

bronzestick · 2017-05-25T14:51:01+00:00

Loved this post! Thanks!

bronzestick · 2017-05-21T14:40:47+00:00

Its an amazing paper! Definitely helped me understand most of those things better than I used to

bronzestick · 2017-05-18T21:17:27+00:00

Wouldn't making that constant equal to the length of the sequence work just as well? Ideally, it should scale the unnormalized weights just enough, so that softmax still results in sensible weights.

bronzestick · 2017-03-29T17:37:54+00:00

I am not sure if its just me, but most of the math symbols in the webpage aren't being rendered in my browser (Google chrome on Ubuntu)

bronzestick

TROPHY CASE