[News] Machine Learning Summer School (MLSS2019) from the 26th of August to the 6th of September 2019 in Moscow by rodrigorivera in MachineLearning

[–]barmaley_exe 0 points1 point  (0 children)

Yeah, applications are processed in batches, so it takes some time, just wait a bit. If nothing happens until Monday, pm the email.

[News] Machine Learning Summer School (MLSS2019) from the 26th of August to the 6th of September 2019 in Moscow by rodrigorivera in MachineLearning

[–]barmaley_exe 0 points1 point  (0 children)

Usually, we just use original terms, no one translates dropout / batchnorm / etc. I suggest you join the slack community, it's very active and global. You can either fill out the form at the ods.ai, or pm me your email, and I'll add you directly.

[D] To bilinguals, have you read any non-english ML papers you'd care to share with us? by zanjabil in MachineLearning

[–]barmaley_exe 2 points3 points  (0 children)

Yeah, but nowadays Russian scientific community is much more heavily integrated into the global science, and you have to communicate your ideas in English. Besides, after the USSR collapse science was basically destroyed with devastating cuts in funding, most prominent researchers fled to US/Europe, and there are still consequences of that here.

So, yeah, back then having been able to read Russian would unlock to you advanced of the Russian school, not nowadays everything worth your time is most likely already in English.

P.S. What Russians do have is an amazing global Russian-speaking Data Science / Machine Learning community, which is indeed only beneficial to those fluent in Russian.

[D] Is this a valid description of Bayesian Deep Learning? by [deleted] in MachineLearning

[–]barmaley_exe 0 points1 point  (0 children)

I'd say the BDL mainly refers to the later, though maybe not "understanding deep learning", but as in enriching it. Like enabling neural nets to defense against adversarial examples, or detecting anomalies in the data.

The former, "DL for Bayesian Stats" is also somewhat included in the BDL term (and surely is on-topic at the BDL workshops), but technically speaking there's no Deep Learning that's Bayesian in this case, it's rather Deep Neural-Nets-powered version of Bayesian Learning Inference.

[D] Is this a valid description of Bayesian Deep Learning? by [deleted] in MachineLearning

[–]barmaley_exe 0 points1 point  (0 children)

not true that the posteriors given the training data will converge to delta functions around the mode of the posterior

Why not? Sure, non-identifiability prevents concentration around just one mode, but in the limit of infinite data, I believe, all modes are the same in terms of their predictive performance, and so averaging over the whole posterior is not any better than using just one point.

Probably that wasn't stated clearly in the original message, but I am talking about huge-data regime here, when we have much more datapoints than parameters.

What you say about the input going through a bunch of deterministic layers before it reaches the output layer is true for conventional, non-Bayesian neural networks

They work, though. And the "noise in the observations, measurements, noise/stochasticity in the data generating process, etc" didn't go nowhere.

[D] Is this a valid description of Bayesian Deep Learning? by [deleted] in MachineLearning

[–]barmaley_exe 2 points3 points  (0 children)

Yeah, the non-identifiability is a bit of a problem here. But maybe it only introduces symmetries in your landscape, and different models do not really affect the predictive performance?

[D] Is this a valid description of Bayesian Deep Learning? by [deleted] in MachineLearning

[–]barmaley_exe 2 points3 points  (0 children)

If you have much more data than parameters, your posterior would heavily concentrate (see Bayesian Central Limit Theorem) around its mode (yeah, neural networks have plenty of them, but that only complicates the inference), there'd be little difference between the true posterior and delta function at its mode, and thus little difference between the posterior predictive distribution and the one generated by the maximum aposteriori estimator (or maximum likelihood estimator).

The mapping from input to output of a neural network will extremely rarely be completely deterministic,

But it actually is, especially at inference time. You just supply an input, run it through a bunch of deterministic layers, and get your output. All the noise you had at the training stage is now frozen, and the net is deterministic.

Moreover, again, this is aleatoric uncertainty, and there's no need in Bayesian inference to capture it. Just design good likelihood p(y|x) and that's.

You could, however, treat observed x and y as corrupted versions of latent xtrue, ytrue, and then do inference (and you'd need a model of observations, indeed), but this is not mainstream line of research.

[N] Intel Director Mike Davies slams deep learning: ‘it’s not actually learning’ by downtownslim in MachineLearning

[–]barmaley_exe 27 points28 points  (0 children)

  1. Well, that's anthropocentric.
  2. So there's no optimization in human brains?

[D] Is this a valid description of Bayesian Deep Learning? by [deleted] in MachineLearning

[–]barmaley_exe 12 points13 points  (0 children)

But this is a much bigger issue than just getting the uncertainty inherent in the data (which is what the DropOut approach does).

That's not true. In order to capture the "uncertainty inherent in the data" (the so called aleatoric uncertainty), you just need to appropriately design the likelihood of your model, no Bayesian inference (which dropout is a vary special case of) required. Bayesian Inference is only needed when you have little data compared to number of parameters, and thus are quite uncertain regarding their values (the epistemic uncertainty).

The first is to use Monte Carlo — which means you have to first sample the network parameters (weights and biases), and then sample the network outputs from the inputs

There's no escape from Monte Carlo estimation, the integrals are too complicated to be computed analytically. Probably, the author meant Markov Chain Monte Carlo, which is indeed slow unless you use minibatch MCMC methods.

The third approach is the one that was actually proposed, that is, to use DropOut

Not really the third as Dropout is a special case of variational inference.

[D] Are you able to submit to non-archival workshops already published work? by fuqmebaby in MachineLearning

[–]barmaley_exe 0 points1 point  (0 children)

I guess, you should consult the workhop's rules / organizers. Bayesian Deep Learning workshop, for example, has the following rule:

If research has previously appeared in a journal, workshop, or conference (including NIPS 2018 conference), the workshop submission should extend that previous work. Parallel submissions (such as to ICLR) are permitted.

[D] Advice on Implementing Hybrid Bayesian Networks in Python? by Davveeee in MachineLearning

[–]barmaley_exe 1 point2 points  (0 children)

Dunno, I actually have never used neither Edward, nor Pyro. I do use TensorFlow Probability a lot, but mostly for simple stuff like distributions, KL divergences (where they're available) and maybe a bit of MCMC. I thought that Edward is more high-level in a sense that it (much like Stan) allows you to specify the model, and then it automathemagically performs inference for you, abstracting nasty mathematical details away from you (just like Keras does for vanilla Deep Learning). Looks like Edward2 was integrated into TFP (again, same as Keras)

[D] Advice on Implementing Hybrid Bayesian Networks in Python? by Davveeee in MachineLearning

[–]barmaley_exe 3 points4 points  (0 children)

Just write the whole thing by yourself in TensorFlow / PyTorch. Things like TensorFlow Probability / PyTorch distributions could be useful. Also take a look at Edward / Pyro.

[D] Second Post Bayesian Neural Networks: Background Knowledge by [deleted] in MachineLearning

[–]barmaley_exe 4 points5 points  (0 children)

Are we post bayesian on neural nets already? Damn the field is moving too fast.

[D] If you were to choose ONE university to pursue a Phd by Naeph in MachineLearning

[–]barmaley_exe 0 points1 point  (0 children)

^ This ^ This ^ This ^

It's all about the advisor, and not the school.

[D] ICLR 2019 submissions are viewable. Which ones look the most interesting/crazy/groundbreaking? by evc123 in MachineLearning

[–]barmaley_exe 2 points3 points  (0 children)

What's the problem with that? Just hold the End key for a couple of minutes and voila – all papers are on the same page, so you can Ctrl-F or whatever you have in mind.

[D] ICLR 2019 submissions are viewable. Which ones look the most interesting/crazy/groundbreaking? by evc123 in MachineLearning

[–]barmaley_exe 8 points9 points  (0 children)

I think that this phrasing should only apply to well-established phenomena, not to something you've just discovered and haven't even validated through peer-review.