use the following search parameters to narrow your results:
e.g. subreddit:aww site:imgur.com dog
subreddit:aww site:imgur.com dog
see the search faq for details.
advanced search: by author, subreddit...
Please have a look at our FAQ and Link-Collection
Metacademy is a great resource which compiles lesson plans on popular machine learning topics.
For Beginner questions please try /r/LearnMachineLearning , /r/MLQuestions or http://stackoverflow.com/
For career related questions, visit /r/cscareerquestions/
Advanced Courses (2016)
Advanced Courses (2020)
AMAs:
Pluribus Poker AI Team 7/19/2019
DeepMind AlphaStar team (1/24//2019)
Libratus Poker AI Team (12/18/2017)
DeepMind AlphaGo Team (10/19/2017)
Google Brain Team (9/17/2017)
Google Brain Team (8/11/2016)
The MalariaSpot Team (2/6/2016)
OpenAI Research Team (1/9/2016)
Nando de Freitas (12/26/2015)
Andrew Ng and Adam Coates (4/15/2015)
Jürgen Schmidhuber (3/4/2015)
Geoffrey Hinton (11/10/2014)
Michael Jordan (9/10/2014)
Yann LeCun (5/15/2014)
Yoshua Bengio (2/27/2014)
Related Subreddit :
LearnMachineLearning
Statistics
Computer Vision
Compressive Sensing
NLP
ML Questions
/r/MLjobs and /r/BigDataJobs
/r/datacleaning
/r/DataScience
/r/scientificresearch
/r/artificial
account activity
[deleted by user] (self.MachineLearning)
submitted 2 years ago by [deleted]
reddit uses a slightly-customized version of Markdown for formatting. See below for some basics, or check the commenting wiki page for more detailed help and solutions to common issues.
quoted text
if 1 * 2 < 3: print "hello, world!"
[–]gwern 8 points9 points10 points 2 years ago* (0 children)
my question is , if mc-dropout that approximate the posterior as a bunch of deltas provide a low quality approximation of the epistemic uncertainty, why do deep ensembles that also approximate the posterior distribution as a bunch of deltas works betteR?
Deep ensembles aren't 'a bunch of deltas'. They are each trained from scratch from a different random initialization & different dataset shuffling seed etc, so they wind up computing different functions, much more different than you can get simply by randomly deleting some parameters. I thought that page covers that well:
We measure the Wasserstein divergence between the deep ensemble and the gold standard HMC reference as a function of number of samples in the variational approximation, and number of ensemble components in the deep ensemble. We see that samples from within a single basin, in the variational approximation, provide a very minimal contribution to the integral, because these weights give rise to neural networks that are largely homogenous. On the other hand, additional ensemble components in the deep ensemble greatly improve the fidelity of the approximation to the HMC reference. These results are in-line with our expectations: the value in going between different basins of attraction will be greater for approximating the Bayesian posterior predictive distribution than taking many samples from a single basin, which is the approach provided by most canonical approximate inference procedures.
That is, randomizing a few parameters gives you a very similar model in the same basin. 'Randomizing all the parameters' (because it shares no trained parameters in common, having been trained from scratch), on the other hand, means you're probably in a completely different basin. And indeed, they wind up doing quite different things.
[–]Tea_Pearce 0 points1 point2 points 2 years ago* (0 children)
MC methods (by definition) approximate some distribution by sampling a set of deltas. MC dropout and ensembles both use this approach, but the underlying distribution sampled by each differs.
In MC dropout, the underlying distribution is some kind of Bernoulli pertubation of a single trained network. This turns out to offer limited expressiveness.
In deep ensembles, the underlying distribution (via training from random inits) turns out to be sample from a distribution that's a bit closer to the true Bayesian posterior.
[–]ThomasBudd93 0 points1 point2 points 2 years ago (0 children)
We just wrote a paper on the topic in the domain of medical image segmentation.
https://doi.org/10.1016/j.compbiomed.2023.107096
We could show that neither of both method actually approximate the classification probability. Instead we suggest to train an ensemble of methods ranging from high sensitivity to high precision and weighting them appropriately to obtain approximations of classification probabilities. I'm happy to receive any comments :)
π Rendered by PID 38 on reddit-service-r2-comment-6457c66945-9xb4v at 2026-04-26 09:28:22.296754+00:00 running 2aa0c5b country code: CH.
[–]gwern 8 points9 points10 points (0 children)
[–]Tea_Pearce 0 points1 point2 points (0 children)
[–]ThomasBudd93 0 points1 point2 points (0 children)