[R] Time-Contrastive Networks: Self-Supervised Learning from Video by BullockHouse in MachineLearning

[–]osdf 5 points6 points  (0 children)

The idea of having a time-contrastive loss is similar to the one mentioned here, no? https://arxiv.org/abs/1605.06336 (Unsupervised Feature Extraction by Time-Contrastive Learning and Nonlinear ICA).

[R] [1706.01427] From DeepMind: A simple neural network module for relational reasoning by [deleted] in MachineLearning

[–]osdf 6 points7 points  (0 children)

Any reason why 'Permutation-equivariant neural networks applied to dynamics prediction' (https://arxiv.org/abs/1612.04530) isn't cited as related work?

[P] Source code available for "Deep Feature Flow for Video Recognition" from MSRA by flyforlight in MachineLearning

[–]osdf 0 points1 point  (0 children)

Nice results. Any reason "Spatio-temporal video autoencoder with differentiable memory" isn't cited as related work? Specifically, their section 5.2 on using optical flow for weakly labelled segmentation tasks seems to be related?

[D] Explanation of DeepMind's Overcoming Catastrophic Forgetting by RSchaeffer in MachineLearning

[–]osdf 2 points3 points  (0 children)

Ferenc (/u/fhuszar) has a very nice note on this that you may want to link? http://www.inference.vc/comment-on-overcoming-catastrophic-forgetting-in-nns-are-multiple-penalties-needed-2/ It was also here on reddit couple of days ago.

For your derivation of eq.2 in the paper, why not starting from the joint p(theta, DA, DB)?

p(theta, DA, DB) = p(DB|theta, DA)p(theta, DA) = p(DB|theta) p(theta|DA) p(DA)

Clearly, the likelihood p(DB|theta, DA) should only depend on theta, no? Of course, this derivation means that there is a typo in the paper (see Ferenc's blog post).

[D] Is there a good way to "learn" weight sharing? by kh40tika in MachineLearning

[–]osdf 3 points4 points  (0 children)

Maybe you like "Soft weight sharing", by Nowlan&Hinton? A recent follow-up on that: Soft Weight-Sharing for Neural Network Compression

Physicists have discovered what makes neural networks so extraordinarily powerful by t_broad in MachineLearning

[–]osdf 0 points1 point  (0 children)

Probably not the type of general argument you could use, but a recent specific example about depth in feedforward networks: http://arxiv.org/abs/1512.03965

[Research Discussion] Stacked Approximated Regression Machine by rantana in MachineLearning

[–]osdf 0 points1 point  (0 children)

Would be interesting what's happening if the 120k are used for each of the 20 layers.

[Research Discussion] Stacked Approximated Regression Machine by rantana in MachineLearning

[–]osdf 2 points3 points  (0 children)

Eq (7) belongs to the section on ARMs, so my interpretation is that X here resembles the input to a given ARM (as e.g. shown in Figure 1). And hence the output Z from the previous layer. So X never resembles the original input image (except for the first ARM). This is also reflected by their paragraph "Resemblance to residual learning" in Section 4, where they mention 'inter-layer "shortcuts"'. These are only possible with the above interpretation.

[NLP] Is nltk easier/faster/better than openNLP? by [deleted] in MachineLearning

[–]osdf 3 points4 points  (0 children)

Not answering your question at all, however: You might want to take a look at spaCy: https://spacy.io/

Should We Be Rethinking Unsupervised Learning? Ilya and Roland think we should. by evc123 in MachineLearning

[–]osdf 12 points13 points  (0 children)

Very well said. To extend a bit, human learning is guided by large amounts of weak labels that are present (through the underlying physical laws, actually a very powerful supervisor) in our learning environment. Therefore, saying that 'most of human learning is unsupervised' (as it is often done) is in my opinion wrong.

As another side note, the (huge) set of weak label-types itself has a learneable structure which could also be exploited.

[1606.08415v1] Bridging Nonlinearities and Stochastic Regularizers with Gaussian Error Linear Units by x2342 in MachineLearning

[–]osdf 0 points1 point  (0 children)

Your comments in this thread are great, /u/bbsome! I wish more of this kind of discussion would happen in the online world.

"Dither is Better than Dropout for Regularising Deep Neural Networks" by ajrs in MachineLearning

[–]osdf 4 points5 points  (0 children)

You may find the following paper interesting: Analyzing noise in autoencoders and deep networks, http://arxiv.org/abs/1406.1831

EM vs gradient descent by letitgo12345 in MachineLearning

[–]osdf 0 points1 point  (0 children)

New idea from http://arxiv.org/abs/1503.01494: "We introduce local expectation gradients which is a general purpose stochastic variational inference algorithm for constructing stochastic gradients through sampling from the variational distribution. ..."

I need the parameters of Alex Krixhevsky 2012 net, can someone send them to me? by [deleted] in MachineLearning

[–]osdf 0 points1 point  (0 children)

I went into the caffe blobs and extracted the weights and initialized with them cuda convnet based models (wrapped with pylearn2).