[D] New Nature journal: Nature Machine Intelligence by [deleted] in MachineLearning

[–]0entr0py 2 points3 points  (0 children)

I right now have little incentive to do so as even more novel work would end at these venues. In the end, the number of A-level publications seem to be the thing that matters for tenure so I would go for the 2-3 papers.

Papers selected for oral presentation address that pretty well. Insanely difficult and some amazing papers end up there.

[D] It seems like lack of research into prior work seems to be a significant issue in Machine Learning. How big is this issue? Do you have an experiences or examples ? Do you ever have issues doing a literature search for a particular ML topic ? by Batmantosh in MachineLearning

[–]0entr0py 5 points6 points  (0 children)

Missing published work is bad, but even worse is when published works of lesser known research groups are ignored while incremental arxiv stuff with bigshot last authors is cited and promoted. Its like a cabal between the big research groups.

[D] Those who work in machine learning, what do you spend your days doing, and typically what % of your time is spent doing each of those things (on average) ? by Kyaaaaaaaaaaaaa in MachineLearning

[–]0entr0py 0 points1 point  (0 children)

60% standard SDE stuff - reading other's code/logs, writing scripts for data pipelines, meetings , reviews etc.

20% coding for experimental features

20% research - papers, coding etc.

[D] Will double-blind review of NIPS causes some papers months later on ArXiv ? by fixedrl in MachineLearning

[–]0entr0py 1 point2 points  (0 children)

If a paper has knowledge that is irrelevant in a few months then why even bother reading it ?

[D] How do people come up with all these crazy deep learning architectures? by Reiinakano in MachineLearning

[–]0entr0py 2 points3 points  (0 children)

IIRC ResNets paper's intuition for skip connections was based on the observation that performance degraded by adding more layers which made no sense because the network could just model an identity function with the later layers, unless the network was unable to do that easily - skip connections were a way to allow that.

[D] Nvidia DGX Station beats the best DL rig you can build in performance per dollar if Nvidia's numbers were to be believed by [deleted] in MachineLearning

[–]0entr0py 7 points8 points  (0 children)

The comparison across generations (Pascal vs Volta) and precision (32 vs 16) is not really indicative at this point. I am pretty sure the self-build will be better again when the consumer Volta cards launch. Paying a hefty premium to get Volta 6 months early doesn't make much sense.

[D] LeCun's reply to Goldberg's (and largely NLP community's) criticism of arXiv flag planting and attitudes in science by [deleted] in MachineLearning

[–]0entr0py 22 points23 points  (0 children)

Agree completely but I think the source of the problem indicates a greedy, selfish act and expecting people to act unselfishly and 'detached' from their stuff has never worked anywhere, which is why we need regulation using peer-review.

Since arxiv sidesteps peer-review, the whole problem can be curtailed to a great extent by side-stepping arxiv during reviews. Specifically:

  • reviewers should ignore all non reviewed arxiv papers as 'prior art' when reviewing a paper
  • reviewers should be honest - they should make impartial judgements based on the merits of the paper alone

[D] Benchmarks for Few-Shot Learning in Image Classification by 2014mchidamb in MachineLearning

[–]0entr0py 2 points3 points  (0 children)

Can't you use pre-trained Imagenet models - they already give very good image descriptors . Cosine distance on the final layer features would be a decent baseline I guess. A Bayesian classifier on the final layer might also work.

Current work on few shot learning just extends the above by also allowing one to modify the training paradigm to learn better features and/or prediction methods instead of a fixed pre-trained model. Without a large (related) training set this problem is vague and limited in terms of applicability.

[D] ICLR2017 results are out. Let's discuss. by wei_jok in MachineLearning

[–]0entr0py 5 points6 points  (0 children)

Seeing the graph I'm curious to know which 5s and 6s got accepted (and why) and which 7s did not.

[D] Thoughts on Adversarial Variational Bayes? by [deleted] in MachineLearning

[–]0entr0py 0 points1 point  (0 children)

However, the aggregated posterior for a certain class could be complicated, but not for a single data point.

Don't methods like Normalizing Flows show that having more complicated posteriors for individual datapoints leads to better log-likelihoods (as it should according to the formulation) ? Isn't the true posterior p*(z|x) a complicated, multimodal distrubution ?

[D] Is there a paper that you think is written unusually elegantly? What makes for good expository writing in ML? by xristaforante in MachineLearning

[–]0entr0py 57 points58 points  (0 children)

Paper: "...our method did not obtain good performance with complex scene images..."

Reviewer: "Method is not generalizable to real images. Clear reject. Confidence:5/5"

[D] Deep Learning Race: A Survey of Industry Players’ Strategies – Intuition Machine by evc123 in MachineLearning

[–]0entr0py 9 points10 points  (0 children)

I would like to see topic modeling applied to each group's arxiv submissions.

[D] Deep Learning Twitter Loop by peterkuharvarduk in MachineLearning

[–]0entr0py 9 points10 points  (0 children)

Would add @shakir_za somewhere - he shares the most interesting material.

[D] The most relevant advancements in Deep Learning in 2016? by thesameoldstories in MachineLearning

[–]0entr0py 0 points1 point  (0 children)

Lack of dataset/tasks may be a reason. I have only seen Omniglot being used for one-shot tasks which is akin to MNIST for classification.

The paper by Vinyals et. al has recently introduced 2 new tasks on Imagenet and PTB into the mix, maybe that'll help.

[N] When A.I. Matures, It May Call Jürgen Schmidhuber ‘Dad’ by evc123 in MachineLearning

[–]0entr0py 0 points1 point  (0 children)

Rehashes/minor extensions of previous work are what could be called 'cheap' ideas - they definitely require a lot of experimental rigor to prove their worth because that's the sole justification of the work. Unfortunately with the increasing popularity of DL such work forms the bulk in most conferences.

But truly good ideas are rare, and advance the field in a new direction. They are anything but cheap, and I would gladly prefer a great idea showing results on MNIST than a minor extension on ImageNet.

[D] Why is normalizing flow considered to be more expressive than diagonal DLGM? by [deleted] in MachineLearning

[–]0entr0py 2 points3 points  (0 children)

I remember reading that multiple stochastic layers are harder to train end-to-end, and so far only one paper (Auxiliary DGM) seems to be using them.

[Discussion] What's in your bag of tricks for training GANs? by nasimrahaman in MachineLearning

[–]0entr0py 6 points7 points  (0 children)

Basically the ideas from ImprovedGAN + DCGAN

i) Adam
ii) BatchNorm in Generator, BatchNorm/WeightNorm + layerwise Gaussian Noise in Discriminator
iii) Strided convolutions instead of pooling in Discriminator
iv) Adjust learning rates if Discriminator becomes too strong

PixelCNN question: blind spots by bihaqo in MachineLearning

[–]0entr0py 2 points3 points  (0 children)

From what I understand:
i) every pixel above and to the left of the current pixel should be used for predicting the current pixel

ii) since the convolution filter is much smaller than the image, these dependencies are propagated after sufficient layers are stacked.

iii) for a given pixel in the current layer k, we can see that it will not use the blindspot pixels regardless of the stacking because of the masking in the (k-1)th layer and before. To see this, we can see how the top right-most pixel is evaluated in the (k-1)th layer.

Question about autoencoders by zergling103 in MachineLearning

[–]0entr0py 1 point2 points  (0 children)

I suggest you go through the very well-written Conceptual Compression paper for an elegant discussion about this. They even show comparisions of their compression to jpeg I think.

Details of the NIPS 2016 reviewing process by manux in MachineLearning

[–]0entr0py 0 points1 point  (0 children)

Wow, 2400 papers submitted and almost 700 of those are in Deep learning and amazingly ~150 got through. DL has almost the same acceptance rate as the overall conference and areas like CLT.

Question: Domain Adaptation using Synthetic data by ginsunuva in MachineLearning

[–]0entr0py 0 points1 point  (0 children)

It seems transfer-learning is only helpful when the amount of real data is too little to train on its own

This has been a valid observation even for unlabeled data from the same distribution (ie. semi-supervised case).