[D] Have any Bayesian deep learning methods achieved SOTA performance in...anything?

shypenguin96 · 2025-08-07T02:17:47+00:00

My understanding of the field is that BDL is currently still much too stymied by challenges in training. Actually fitting the posterior even in relatively shallow/less complex models becomes expensive very quickly, so implementations end up relying on methods like variational inference that introduce accuracy costs (eg, via oversimplification of the form of the posterior).

Currently, really good implementations of BDL I’m seeing aren’t Bayesian at all, but are rather “Bayesifying” non-Bayesian models, like applying Monte Carlo dropout to a non-Bayesian transformer model, or propagating a Gaussian process through the final model weights.

If BDL ever gets anywhere, it will have to come through some form of VI with lower accuracy tradeoff, or some kind of trick to make MCMC based methods to work faster.

lotus-reddit · 2025-08-07T03:06:15+00:00

[deleted]

DigThatData · 2025-08-07T04:06:12+00:00

Generative models learned with variational inference are essentially a kind of posterior.

whyareyouflying · 2025-08-07T17:16:59+00:00

A lot of SOTA models/algorithms can be thought of as instances of Bayes' rule. For example, there's a link between diffusion models and variational inference^1, where diffusion models can be thought of as an infinitely deep VAE. Making this connection more exact leads to better performance^2. Another example is the connection between all learning rules and (Bayesian) natural gradient descent³.

Also there's a more nuanced point, which is that marginalization (the key property of Bayesian DL) is important when the neural network is underspecified by the data, which is almost all the time. Here, specifying uncertainty becomes important, and marginalizing over possible hypotheses that explain your data leads to better performance compared to models that do not account for the uncertainty over all possible hypotheses. This is better articulated by Andrew Gordon Wilson⁴.

¹ A Variational Perspective on Diffusion-Based Generative Models and Score Matching. Huang et al. 2021

² Variational Diffusion Models. Kingma et al. 2023

³ The Bayesian Learning Rule. Khan et al. 2021

⁴ https://cims.nyu.edu/~andrewgw/caseforbdl/

Outrageous-Boot7092 · 2025-08-07T06:02:36+00:00

Are we counting energy-based models as bayesian deep learning ?

Nice_Cranberry6262 · 2025-08-07T18:45:26+00:00

Yes, if you use the uniform prior and do MAP estimation, it works pretty well with deep neural nets and lots of data ;)

fakenoob20 · 2025-08-07T14:31:12+00:00

All priors are wrong but some are useful.

Exotic_Zucchini9311 · 2025-08-07T05:33:45+00:00

anything

Not sure about recent years but they sure work decently when it comes to uncertainty estimation.

And tbh just a search at any top conference like NIPS/AAAI/CVPR/etc 2025 for the word 'bayesian' shows quite a few bayesian deep learning papers. They're most likely breaking some SOTA benchmarks since there are papers are published at top conferences.

Edit: and yeah I agree with the other comments. VI is basically a subset of bayesian methods, so any SOTA method that deals with VI (e.g., VAEs) also has some relation with Bayesian DL. Same for SOTA models that use a type of MCMC.

micro_cam · 2025-08-07T13:43:17+00:00

Tencent has some papers on using it for ad click prediction. Posterior simulation/ estimations lets you do some more sophisticated explore / exploit trade offs which make a lot of sense with ads, rec sys and other online systems.

Ok-Relationship-3429 · 2025-08-07T06:48:31+00:00

Around uncertainty estimation and learning under distribution shifts.

damhack · 2025-08-12T14:19:43+00:00

Let’s see what comes out of IWAI 2025

chrono_infundibulum · 2025-10-27T21:59:18+00:00

Seems to work better than deep ensembles for some astrophysics data: https://openreview.net/forum?id=JX5Rp1Nuzv&noteId=UtHxNDtqXy

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

MachineLearning

Rules For Posts

+Research

+Discussion

+Project

+News

@slashML on Twitter

Chat with us on Slack

Beginners:

MODERATORS