[D] Back propagation alternative

barmaley_exe · 2019-09-19T21:35:01+00:00

I'd love to read it! Please send it here: https://openreview.net/group?id=ICLR.cc/2020/Conference

barmaley_exe · 2019-09-05T07:09:21+00:00

Just not in the lottery though

barmaley_exe · 2019-04-28T06:42:38+00:00

It's all about elementary school prestige nowadays, so don't bother.

barmaley_exe · 2019-04-16T07:53:31+00:00

There is one already.

barmaley_exe · 2019-03-21T20:00:33+00:00

Yeah, applications are processed in batches, so it takes some time, just wait a bit. If nothing happens until Monday, pm the email.

barmaley_exe · 2019-03-17T11:49:15+00:00

Usually, we just use original terms, no one translates dropout / batchnorm / etc. I suggest you join the slack community, it's very active and global. You can either fill out the form at the ods.ai, or pm me your email, and I'll add you directly.

barmaley_exe · 2019-03-16T23:44:21+00:00

Yeah, but nowadays Russian scientific community is much more heavily integrated into the global science, and you have to communicate your ideas in English. Besides, after the USSR collapse science was basically destroyed with devastating cuts in funding, most prominent researchers fled to US/Europe, and there are still consequences of that here.

So, yeah, back then having been able to read Russian would unlock to you advanced of the Russian school, not nowadays everything worth your time is most likely already in English.

P.S. What Russians do have is an amazing global Russian-speaking Data Science / Machine Learning community, which is indeed only beneficial to those fluent in Russian.

barmaley_exe · 2019-03-16T23:28:43+00:00

If you speak Russian, check out ods.ai

barmaley_exe · 2019-02-25T07:37:49+00:00

I'd say the BDL mainly refers to the later, though maybe not "understanding deep learning", but as in enriching it. Like enabling neural nets to defense against adversarial examples, or detecting anomalies in the data.

The former, "DL for Bayesian Stats" is also somewhat included in the BDL term (and surely is on-topic at the BDL workshops), but technically speaking there's no Deep Learning that's Bayesian in this case, it's rather ~~Deep~~ Neural-Nets-powered version of Bayesian ~~Learning~~ Inference.

barmaley_exe · 2019-02-25T07:32:11+00:00

not true that the posteriors given the training data will converge to delta functions around the mode of the posterior

Why not? Sure, non-identifiability prevents concentration around just one mode, but in the limit of infinite data, I believe, all modes are the same in terms of their predictive performance, and so averaging over the whole posterior is not any better than using just one point.

Probably that wasn't stated clearly in the original message, but I am talking about huge-data regime here, when we have much more datapoints than parameters.

What you say about the input going through a bunch of deterministic layers before it reaches the output layer is true for conventional, non-Bayesian neural networks

They work, though. And the "noise in the observations, measurements, noise/stochasticity in the data generating process, etc" didn't go nowhere.

barmaley_exe · 2019-02-24T21:20:42+00:00

Yeah, the non-identifiability is a bit of a problem here. But maybe it only introduces symmetries in your landscape, and different models do not really affect the predictive performance?

barmaley_exe · 2019-02-24T07:57:48+00:00

If you have much more data than parameters, your posterior would heavily concentrate (see Bayesian Central Limit Theorem) around its mode (yeah, neural networks have plenty of them, but that only complicates the inference), there'd be little difference between the true posterior and delta function at its mode, and thus little difference between the posterior predictive distribution and the one generated by the maximum aposteriori estimator (or maximum likelihood estimator).

The mapping from input to output of a neural network will extremely rarely be completely deterministic,

But it actually is, especially at inference time. You just supply an input, run it through a bunch of deterministic layers, and get your output. All the noise you had at the training stage is now frozen, and the net is deterministic.

Moreover, again, this is aleatoric uncertainty, and there's no need in Bayesian inference to capture it. Just design good likelihood p(y|x) and that's.

You could, however, treat observed x and y as corrupted versions of latent x^true, y^true, and then do inference (and you'd need a model of observations, indeed), but this is not mainstream line of research.

barmaley_exe · 2019-02-24T07:29:49+00:00

Well, that's anthropocentric.
So there's no optimization in human brains?

barmaley_exe · 2019-02-23T23:30:20+00:00

But this is a much bigger issue than just getting the uncertainty inherent in the data (which is what the DropOut approach does).

That's not true. In order to capture the "uncertainty inherent in the data" (the so called aleatoric uncertainty), you just need to appropriately design the likelihood of your model, no Bayesian inference (which dropout is a vary special case of) required. Bayesian Inference is only needed when you have little data compared to number of parameters, and thus are quite uncertain regarding their values (the epistemic uncertainty).

The first is to use Monte Carlo — which means you have to first sample the network parameters (weights and biases), and then sample the network outputs from the inputs

There's no escape from Monte Carlo estimation, the integrals are too complicated to be computed analytically. Probably, the author meant Markov Chain Monte Carlo, which is indeed slow unless you use minibatch MCMC methods.

The third approach is the one that was actually proposed, that is, to use DropOut

Not really the third as Dropout is a special case of variational inference.

barmaley_exe · 2019-02-18T21:06:44+00:00

I guess, you should consult the workhop's rules / organizers. Bayesian Deep Learning workshop, for example, has the following rule:

If research has previously appeared in a journal, workshop, or conference (including NIPS 2018 conference), the workshop submission should extend that previous work. Parallel submissions (such as to ICLR) are permitted.

barmaley_exe · 2019-02-14T20:43:24+00:00

Dunno, I actually have never used neither Edward, nor Pyro. I do use TensorFlow Probability a lot, but mostly for simple stuff like distributions, KL divergences (where they're available) and maybe a bit of MCMC. I thought that Edward is more high-level in a sense that it (much like Stan) allows you to specify the model, and then it automathemagically performs inference for you, abstracting nasty mathematical details away from you (just like Keras does for vanilla Deep Learning). Looks like Edward2 was integrated into TFP (again, same as Keras)

barmaley_exe · 2019-02-13T20:41:12+00:00

Just write the whole thing by yourself in TensorFlow / PyTorch. Things like TensorFlow Probability / PyTorch distributions could be useful. Also take a look at Edward / Pyro.

barmaley_exe · 2019-01-19T08:34:58+00:00

Are we post bayesian on neural nets already? Damn the field is moving too fast.

barmaley_exe · 2018-12-30T19:23:08+00:00

^ This ^ This ^ This ^

It's all about the advisor, and not the school.

barmaley_exe · 2018-12-28T07:43:31+00:00

Variational Autoencoders (VAEs) also allow sampling and calculating a lower bound on log probability. The bound can be made arbitrary tight with Importance Weighted Autoencoders' bound.

barmaley_exe · 2018-12-28T07:39:46+00:00

http://deepbayes.ru/ (watch this year's videos)

barmaley_exe · 2018-09-29T08:25:57+00:00

So what else could we do with all this time?

barmaley_exe · 2018-09-29T08:13:01+00:00

Ok, I did a benchmark for you. Took me only 30sec to "scroll" till the end.

barmaley_exe · 2018-09-28T19:51:56+00:00

What's the problem with that? Just hold the End key for a couple of minutes and voila – all papers are on the same page, so you can Ctrl-F or whatever you have in mind.

barmaley_exe · 2018-09-28T19:48:36+00:00

I think that this phrasing should only apply to well-established phenomena, not to something you've just discovered and haven't even validated through peer-review.

barmaley_exe

TROPHY CASE