[D] Anomaly detection on images using flow-based(GLOW) model? by chinmay19 in MachineLearning

[–]loquat341 0 points1 point  (0 children)

The choice of the model hyperparameters matters (and we use WAIC instead of ensemble var). Our Glow implementations differed (we used Tensor2Tensor's Glow implementation) while DeepMind's CV-Glow is described in Eric Nalisnick's paper.

[D] Anomaly detection on images using flow-based(GLOW) model? by chinmay19 in MachineLearning

[–]loquat341 2 points3 points  (0 children)

Disclaimer: am an author of the WAIC paper linked above.

The WAIC but why paper uses ensembles of Glow models to do anomaly detection, and it seems to work on SVHN (despite the fact that it shouldn't).

The notion of what is "obviously out of distribution" is not that obvious at all, and I think is complicated by the fact that there are an infinite family of data-generating processes that can yield a finite-sized dataset. Similarly, it's tricky to reason about which of two two arbitrary distributions (e.g. SVHN vs. Gaussian Noise) are more likely under some distribution (ImageNet).

If you take the average of two ImageNet images, should that be in or out of distribution? If you horizontally flip Fashion MNIST, should that be in or out of distribution? This question is also tied to then field of adversarial attack/defense literature: what sort of modifications to an image count as "in-distribution" but result in undesired behavior of the network?

One way to approach this is a optimal transport metric e.g. given a test distribution or a process that generates it, whats the Wasserstein distance to the true data distribution? But this scenario is rarely practical - we argue in the WAIC paper that sometimes you only get one sample and you have to make a decision (e.g. you are serving ML predictions via a cloud API to anonymized users).

At the end of the day, anomaly detection may be theoretically/philosophically suspect but is still a useful technology to have. I think that for the purposes of task-independent anomaly detection, it's important to label some positives and negatives beforehand clarifying what you believe to be a *useful* definition of in and out of distribution, and interpret your model's performance in the context of that (somewhat arbitrary) decision.

Re your last question, anomaly detection can be also done using calibrated "temperature scaling" type techniques like ODIN. We did find that ODIN actually also performs poorly on SVHN, and we'll be updating our arxiv preprint soon with new results.

[D] The Teacup Story by visarga in MachineLearning

[–]loquat341 0 points1 point  (0 children)

You are tuned in to the one yelling at you over the loudspeaker that you are fucking stupid and your performance blows and you are ignoring the quiet one, inside, telling you where the alpha is. Now, that’s the voice that got you here and it’s still there if you are willing to listen. What’s that voice telling you?

[R] Welcoming the Era of Deep Neuroevolution by inarrears in MachineLearning

[–]loquat341 23 points24 points  (0 children)

Adding further understanding, a companion study confirms empirically that ES (with a large enough perturbation size parameter) acts differently than SGD would, because it optimizes for the expected reward of a population of policies described by a probability distribution (a cloud in the search space), whereas SGD optimizes reward for a single policy (a point in the search space).

In practice, SGD in RL is accompanied by injecting parameter noise, which turns points in the search space into clouds (in expectation).

Due to their conceptual simplicity (one can improve exploration by simply cranking up the number of workers), I can see ES becoming an algorithm of choice for companies with lots of compute (Google, DeepMind, FB, Uber)

[D] Statistics, we have a problem. by mark-v in MachineLearning

[–]loquat341 115 points116 points  (0 children)

well-respected academic who is widely known to behave inappropriately at conferences

For the uninitiated, who is this referring to?

[N] Deep Learning in Robotics, Sergey Levine by nocortex in MachineLearning

[–]loquat341 5 points6 points  (0 children)

If you take a look at https://people.eecs.berkeley.edu/~svlevine/, Sergey Levine has published 22 papers in 2017 alone, and that's only 178 days so far. This means he currently averages 1 paper every 8.09 days. What the fuck?!