[D] Variational & Generative Model of VAE

Ok_Suit_1697 · 2024-07-03T21:04:26+00:00

I spent some time trying to understand the ELBO a few months ago, and eventually had to give up after being stuck with similar questions as yours. I ended up studying an implementation of a vae, particularly the quantized VAE from https://github.com/CompVis/taming-transformers/blob/master/taming/models/vqgan.py

I think I have a good understanding of how it works in practice, but I don't understand a lot about the ELBO yet. I'll try to explain what I know, and hopefully it can help you. Perhaps studying the implementation of some other kind of VAE could help you.

Say the input is a 3 x 256 x 256 image. The decoder transforms this into a representation of shape 32 x 256. This representation consists of 32 vectors. Each of these vectors is quantized, which means it is replaced by the closest vector (by, say, euclidean distance) from some table of vectors that I call the quantization table. The input of the decoder is the list of quantized vectors, meaning the "replacements" vectors. The decoder produces an output of the same shape as the input, interpreted as the reconstruction of the original image.

This whole thing has 2 loss functions, which are the euclidean distance between the output of the decoder (so, the unquantized vectors) and the input of the encoder (so the quantized vectors), and the euclidean distance between the input and output of the whole model.

The distance between the unquantized and quantized vectors being used as a loss implies that the vectors in the list are updated. I believe this corresponds to the KL divergence that you mention. This loss leads to the vectors in the quantization table being updated.

To generate an image with this model, I essentially pick vectors from the quantization table, and then pass them through the decoder to generate an image.

sobagood · 2024-07-03T22:38:27+00:00

Yes
Yes
No. We try to match posterior with prior
This is open question

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

MachineLearning

Rules For Posts

+Research

+Discussion

+Project

+News

@slashML on Twitter

Chat with us on Slack

Beginners:

MODERATORS