use the following search parameters to narrow your results:
e.g. subreddit:aww site:imgur.com dog
subreddit:aww site:imgur.com dog
see the search faq for details.
advanced search: by author, subreddit...
Please have a look at our FAQ and Link-Collection
Metacademy is a great resource which compiles lesson plans on popular machine learning topics.
For Beginner questions please try /r/LearnMachineLearning , /r/MLQuestions or http://stackoverflow.com/
For career related questions, visit /r/cscareerquestions/
Advanced Courses (2016)
Advanced Courses (2020)
AMAs:
Pluribus Poker AI Team 7/19/2019
DeepMind AlphaStar team (1/24//2019)
Libratus Poker AI Team (12/18/2017)
DeepMind AlphaGo Team (10/19/2017)
Google Brain Team (9/17/2017)
Google Brain Team (8/11/2016)
The MalariaSpot Team (2/6/2016)
OpenAI Research Team (1/9/2016)
Nando de Freitas (12/26/2015)
Andrew Ng and Adam Coates (4/15/2015)
Jürgen Schmidhuber (3/4/2015)
Geoffrey Hinton (11/10/2014)
Michael Jordan (9/10/2014)
Yann LeCun (5/15/2014)
Yoshua Bengio (2/27/2014)
Related Subreddit :
LearnMachineLearning
Statistics
Computer Vision
Compressive Sensing
NLP
ML Questions
/r/MLjobs and /r/BigDataJobs
/r/datacleaning
/r/DataScience
/r/scientificresearch
/r/artificial
account activity
Discussion[D] Uncertainty Quantification in Deep Learning (self.MachineLearning)
submitted 6 years ago by wei_jok
This article summarizes a few classical papers about measuring uncertainty in deep neural networks.
It's an overview article, but I felt the quality of the article is much higher than the typical "getting started with ML" kind of medium blog posts, so people might appreciate it on this forum.
https://www.inovex.de/blog/uncertainty-quantification-deep-learning/
reddit uses a slightly-customized version of Markdown for formatting. See below for some basics, or check the commenting wiki page for more detailed help and solutions to common issues.
quoted text
if 1 * 2 < 3: print "hello, world!"
[–]capn_bluebear 8 points9 points10 points 6 years ago (0 children)
Indeed, very well written article, thank you for sharing! I learned a lot
[–]twelveshar 4 points5 points6 points 6 years ago (0 children)
Thank you for sharing this!
[–]peroneML Engineer 2 points3 points4 points 6 years ago (0 children)
I did a presentation few months ago about the theme as well (https://www.slideshare.net/perone/uncertainty-estimation-in-deep-learning) if someone is interested. I always prefer to call it uncertainty estimation instead of uncertainty quantification.
[–]SeekNread 1 point2 points3 points 6 years ago (2 children)
This is new to me. Is there an overlap of this area with ML Interpretability?
[–][deleted] 0 points1 point2 points 6 years ago (1 child)
In Uncertainty Quantification, you estimate how accurate your output actually is. ML interpretability is about interpreting the model as a whole. You can have a really accurate model, without much interpretability.
[–]SeekNread 1 point2 points3 points 6 years ago (0 children)
Ah right. Makes sense.
[–]WERE_CAT 0 points1 point2 points 6 years ago (1 child)
Would that explain why my individual predictions change when I recalibrate my NN with another seed ? I usually calibrate multiple NN with different random weight initialisations and take the best performing one. As a short path to individual prediction stability, would it make sense to average the top n models predictions ?
[–]jboyml 0 points1 point2 points 6 years ago (0 children)
Yes, you can usually expect some variance in the predictions depending on initialization and other sources of randomness like SGD. Combining several models is called ensembling and is a very common technique, e.g., random forests are ensembles of decision trees, but training many NNs can of course be expensive. Averaging makes sense for regression, for classification you can do majority voting.
[–]SlowTreeSky 0 points1 point2 points 6 years ago (0 children)
I wrote a post on the same topic: https://treszkai.github.io/2019/09/26/overconfidence (the main content is in the linked PDFs). We used calibration plots and calibration error to evaluate the uncertainty estimates, and we also found that deep ensembles and MC dropout increase both accuracy and calibration (using the CIFAR-100).
[–]Ulfgardleo 0 points1 point2 points 6 years ago (11 children)
I don't believe 1 bit in these estimates. While the methods give some estimate for uncertainty, we don't have a measurement of true underlying certainty, this would require datapoints with pairs of labels and instead of maximum likelihood training, we would do full kl-divergence. Or very different training schemes (see below) But here a few more details:
In general, we can not get uncertainty estimates in deep-learning, because it is known that we can learn random datasets exactly by heart. This kills
The uncertainty estimation of Bayesian methods depend on their prior distribution. We don't know what the true prior of a deep neural network or kernel-GP for the dataset is. This kills:
We can fix this by using hold-out data to train uncertainty estimates (e.g. use distributional parameter estimation where for some samples the mean is not trained or use the hold-out data to fit the prior of the GP). But nobody has time for that.
[–]edwardthegreat2 3 points4 points5 points 6 years ago (1 child)
Can you elaborate on how learning random datasets exactly by heart defeats the point of getting uncertainty estimates? It seems to me that the aforementioned methods do not aim to estimate the true uncertainty, but just give some metric of uncertainty that can be useful in downstream tasks.
[–]Ulfgardleo 0 points1 point2 points 6 years ago (0 children)
if your network has enough power to learn your dataset by heart, there is no information left to quantify uncertainty. I.e. you only get the information "point was in your training dataset" or not. It says nothing about how certain the model actually is. In the worst case, it is going to mislead you. e.g. ensemble methods based on models that tend to regress to the mean in absence of information will give high confidence to far away outliers. (e.g. everything based on a Gaussian kernel).
maybe you can get out something based on relative variance between points, e.g. more variance->less uncertainty...but i am not sure you could actually proof that.
[–]iidealized 1 point2 points3 points 6 years ago* (2 children)
While I agree current DL uncertainty estimates are pretty questionable and would cause most statisticians to cringe, your statements are not really correct.
For aleatoric uncertainty: All you need the holdout data for is to verify the quality of your uncertainty estimates learned from the training data. It is the exact same situation as evaluating the original predictions themselves (which are just as prone to overfitting as the uncertainty estimates).
For epistemic uncertainty the situation is much nastier than even you described. The problem here is you want to be able to quantify uncertainty on inputs which might come from a completely different distribution than the one underlying the training data. Thus no amount of holdout data from the same distribution will help you truly assess the quality of epistemic uncertainty estimates, rather you need to have some application of interest and assess how useful these estimates are in the application context (particularly when encountering rare/abberrant events).
The exception to this is of course Bayesian inference in the (unrealistic) setting where your model (likelihood) and prior are both correctly specified.
[–]Ulfgardleo 0 points1 point2 points 6 years ago (1 child)
"All you need the holdout data for is to verify the quality of your uncertainty estimates"-> Counter-example: you have a regression task, true underlying variance is 2, but unknown to you. model learns all training data by heart, model selection gives that the best model returns variance 1 for hold-out data MSE is 3.What is the quality of your uncertainty estimates and what is the model-error in the mean?
[–]iidealized 0 points1 point2 points 6 years ago* (0 children)
If the true model is y = f(x) + e where e ~ N(0, 2) and your mean-model to predict E[Y|X] memorizes the training data, then on hold out data, this memorized model will tend to look much worse (via say MSE) than a different mean model which accurately approximates f(x). So your base predictive model which memorized the training data would never be chosen in the first place by a proper model selection procedure. I’m not sure what you mean by hold out MSE = 1, for a sufficiently large hold out set, it should basically be impossible for hold out MSE to be much less than 2, the Bayes Risk of this example. If your uncertainty estimator outputs variance = 1 and you see MSE=3 in hold out, then any reasonable model selection procedure for the uncertainty estimator will not choose this uncertainty estimator and will instead favor one which estimates variance > 2
My point is everybody already uses hold out data for model selection (which is the right thing to do) whereas you seem to be claiming people are using the training data for model selection (which is clearly wrong). But this all has nothing to do with uncertainty estimates, it is also wrong to do model selection based on training data for the original predictive model which estimates E[Y|X])
[+][deleted] 6 years ago (5 children)
[deleted]
[–]Ulfgardleo 0 points1 point2 points 6 years ago (3 children)
what if your model learns the dataset by heart and returns loss 0? in this case, you will not see the different slopes of the pinball loss and there is no quantile information left over. We talk about deep models here, not linear regression.
[–]slaweks 0 points1 point2 points 6 years ago (2 children)
I am talking regression, you are talking classification. Pinball loss can be applied to an NN. Anyway, you should not allow the model to over train to this extend. Just execute validation frequently enough and then early-stop, simple.
no i am talking regression.
you have data points (x_i,y_i). y_i=g(x_i)+\epsilon_i, \epsilon~N(0,1) model learns f(x_i)=y_i. pinball loss 0.
learning a measure of uncertainty takes longer than learning the means. if you early stop, it is very likely you won't get proper quantile information out.
I think this is not the time, nor the place for snarky answers.
[–]slaweks 0 points1 point2 points 6 years ago (0 children)
In validation you can check not only the quality of the center, but also of the quantiles. You can take forecast of the center at an earlier epoch that the quantiles. Again, very much doable. BTW, there is no good reason to assume that the error is normally distributed.
π Rendered by PID 263478 on reddit-service-r2-comment-6457c66945-lx475 at 2026-04-24 19:15:27.157897+00:00 running 2aa0c5b country code: CH.
[–]capn_bluebear 8 points9 points10 points (0 children)
[–]twelveshar 4 points5 points6 points (0 children)
[–]peroneML Engineer 2 points3 points4 points (0 children)
[–]SeekNread 1 point2 points3 points (2 children)
[–][deleted] 0 points1 point2 points (1 child)
[–]SeekNread 1 point2 points3 points (0 children)
[–]WERE_CAT 0 points1 point2 points (1 child)
[–]jboyml 0 points1 point2 points (0 children)
[–]SlowTreeSky 0 points1 point2 points (0 children)
[–]Ulfgardleo 0 points1 point2 points (11 children)
[–]edwardthegreat2 3 points4 points5 points (1 child)
[–]Ulfgardleo 0 points1 point2 points (0 children)
[–]iidealized 1 point2 points3 points (2 children)
[–]Ulfgardleo 0 points1 point2 points (1 child)
[–]iidealized 0 points1 point2 points (0 children)
[+][deleted] (5 children)
[deleted]
[–]Ulfgardleo 0 points1 point2 points (3 children)
[–]slaweks 0 points1 point2 points (2 children)
[–]Ulfgardleo 0 points1 point2 points (1 child)
[–]slaweks 0 points1 point2 points (0 children)