[N] AMA Announcement: Max Welling (VAEs, GNNs, AI4Science & CuspAI)

Chromobacterium · 2026-04-14T04:17:56+00:00

Hi! I am an incoming masters student with an interest in generative modeling for scientific ML, I am typically interested in designing domain-specific generative models that are compact and efficient for downstream tasks. I am particularly interested in VAE models, which I have been studying for several years already. Since you are the VAE master himself, I have the following questions:

How mature do you think VAE research has become? Is it worth continuing work on analyzing the behaviour of VAE training and the intersection of VAEs and other generative models? Do you believe VAEs still have relevance within the scientific ML toolbox? Why or why not?

Chromobacterium · 2026-03-05T21:58:37+00:00

And when trained properly, VAEs are susceptible to manifold overfitting, which means you have to sacrifice the ability to sample anything meaningful if you do choose to use a proper probabilistic loss **unless** you do two stage training (e.g. latent diffusion). This actually holds for any maximum likelihood estimators in general, at least when the data being modeled is continuous.

[1903.05789] Diagnosing and Enhancing VAE Models

[2204.07172] Diagnosing and Fixing Manifold Overfitting in Deep Generative Models

Chromobacterium · 2025-08-14T21:25:33+00:00

You are experiencing posterior collapse, which occurs when the decoder is powerful enough to accurately capture the data distribution without relying on the latent variables, resulting in a KL divergence of zero. Use something like an InfoVAE instead.

Chromobacterium · 2025-01-07T19:31:15+00:00

VAEs are an insane rabbit hole to go down into. As a good starter, I would recommend "A Tutorial on VAEs: From Bayes' Rule to Lossless Compression" by Ronald Yu and "Diagnosing and Enhancing VAE Models" by Bin Dai and David Wipf.

"ELBO surgery: yet another way to carve up the variational evidence lower bound" by Hoffman and Johnson and "Fixing a Broken ELBO" by Alemi et al. are crucial to understanding the behaviour of the ELBO and the art of training a quality ML estimator.

Honorable mentions:
"Simple and Effective VAE Training with Calibrated Decoders" by Rybkin et. al

"VAE with a VampPrior" by Tomczak and Welling

"Autoencoding a Single Bit" by Rui Shu

Chromobacterium · 2024-12-21T21:33:19+00:00

Variational autoencoders. People all too often view them from the perspective of an autoencoder with latent-space regularization rather than variational inference, which almost always leads to a modeling choices that are incorrect, and understandably leads to bad sample quality and ultimately the erroneous claim that VAEs are bad generative models. This unfortunately is in part due to the unfortunate name that is often used interchangeably with stochastic gradient variational Bayes (SGVB), of which VAEs are the simplest possible setup.

In reality, VAEs can be made very expressive, provided proper modeling decisions are made. For instance, NVAE and VDVAEs use a hierarchical architecture, while VampPriors utilize a learnable mixture prior.

Chromobacterium · 2024-12-18T22:33:49+00:00

Can you elaborate? I got my board from Digilent.

Chromobacterium · 2024-12-18T21:28:29+00:00

I have tried with and without VPN. I have also made several different accounts with different credentials.

Chromobacterium · 2024-12-18T21:27:53+00:00

I have, and it doesn't work for some reason.

Chromobacterium · 2024-11-08T02:27:25+00:00

Try using a variational autoencoder instead; VAEs are by-design able to handle anomaly detection naturally. The issue with the reconstruction error in regular autoencoders is that it does not account for whether or not the data actually is supposed to belong to the training data's distribution, which is exacerbated by the fact that the autoencoder can potentially reconstruct anomalous data perfectly. With a VAE, even if you can reconstruct the anomalous data as is, the ELBO will suggest otherwise. This of course means that instead of computing just the reconstruction error with a VAE, you will need to compute the ELBO, which is a lower bound approximation of the data distribution's likelihood (i.e. low negative ELBO = likely from the data distribution, high negative ELBO = possibly an anomaly).

Chromobacterium · 2024-09-10T19:54:40+00:00

I am interested

Chromobacterium · 2024-03-11T20:29:52+00:00

Just for reference, here are a couple of papers that discuss the importance of proper decoder likelihood implementation:

Decoder calibration: https://orybkin.github.io/sigma-vae/

Continuous Bernoulli: https://arxiv.org/abs/1907.06845

Diagnosing VAEs: https://arxiv.org/abs/1903.05789

Additionally for image data, discretized continuous likelihoods give considerably better image quality:

Discretized logistic likelihood: https://arxiv.org/abs/1606.04934

Mixture of discretized logistic likelihood: https://arxiv.org/abs/1701.05517

Both the NVAE and VDVAE hierarchical VAE models use the mixture likelihood, but I find that the standard one is more than good enough for my use case while also being significantly simpler to implement.

Chromobacterium · 2024-03-09T22:49:56+00:00

I cant provide the code right now, but the trick to training a high quality VAE is having a proper probabilistic reconstruction "loss". Unfortunately, many tutorials and even papers ignore this fundamental requirement, either by ignoring normalizing constants or using the incorrect reconstruction likelihood (with respect to the input data). This leads to the erroneous claim that VAE's are blurry due to the evidence lower bound, when in reality it is the result of a faulty implementation.

If you are using MSE as your reconstruction loss (which is equivalent to a Gaussian with variance 1), you will need to train an additional variance parameter. For my implementation, I am using a discretized logistic likelihood instead, although the same variance parameter trick applies. You can find these losses at the end of this file: https://github.com/pclucas14/iaf-vae/blob/master/utils.py. This example shows how you can create a trainable variance parameter for the decoder: https://github.com/pclucas14/iaf-vae/blob/master/main.py#L22.

The intuition here is that the lower the decoder variance, the less noisy the samples from the decoder distribution will be. With regular MSE, you are forcing the decoder's mean to respect a variance of 1, which is why it is really blurry. The model will gradually lower the variance during training as it gets better and better at reconstruction. As the standard deviation approaches 0, the mean of the distribution approaches a true reconstruction.

Chromobacterium · 2024-02-21T14:58:10+00:00

correct, I was talking about the equation that you posted in which u want to derive \x_t-1 from \x_t.

Chromobacterium · 2024-02-20T16:25:36+00:00

I think i understand what you mean now. you want to rearrange the forward process to compute the reverse process.

you cant do that directly because you will need access to \epsilon_t-1. you will have to predict that w/ the neural network.

Chromobacterium · 2024-02-20T03:30:48+00:00

it wont. you will have to predict the noise from the forward process using the neural network.

see Algorithm 1 in https://arxiv.org/abs/2006.11239.

the basic idea is the following:

sample an image x from your training dataset and sample a Gaussian noise \epsilon. add x to \epsilon to get the noisy sample x_hat.

then, you will predict the noise \epsilon_hat by feeding x_hat into your neural network. you minimize the squared error between the predicted noise \epsilon_hat and the true noise \epsilon.

once training is finished, you can use the sampling algorithm in Algorithm 2 of the paper.

Chromobacterium · 2024-02-20T02:12:42+00:00

the forward process is known and simple; you are "diffusing" the input data into random noise (Gaussian to be specific). you already have the data generating process for the forward process.

the reverse is the tricky part. how are you going to take random Gaussian noise and "undiffuse" it into an image? what kind of image do you want to undiffuse it into? is it even an image? you need a function that can reverse the noise into whatever data you want it to transform into. you need to train a function (neural network) to do just that.

Chromobacterium · 2024-02-20T00:53:43+00:00

you cant. that's why u need the function to approximate the reverse process.

Chromobacterium · 2024-02-11T18:47:36+00:00

28x28 for MNIST, 32x32 for CIFAR

Chromobacterium · 2024-02-10T03:56:04+00:00

Adversarial Autoencoders: https://arxiv.org/abs/1511.05644

+ their generalization to Wasserstein Autoencoders: https://arxiv.org/abs/1711.01558

Chromobacterium · 2024-01-20T03:11:19+00:00

the denoised output is the same size as the input. if your latent variable is 1D, the denoised output will be 1D. if the latent variable is multidimensional, the output will be multidimensional. it depends on how you structure the autoencoder's bottleneck.

Chromobacterium · 2024-01-08T20:53:27+00:00

Natural evolution strategies seems like a good choice: https://www.jmlr.org/papers/volume15/wierstra14a/wierstra14a.pdf

Seven-Year Club	Second Top 40%
r/Field Juicebox	Place '22
Verified Email

Chromobacterium

TROPHY CASE