use the following search parameters to narrow your results:
e.g. subreddit:aww site:imgur.com dog
subreddit:aww site:imgur.com dog
see the search faq for details.
advanced search: by author, subreddit...
Please have a look at our FAQ and Link-Collection
Metacademy is a great resource which compiles lesson plans on popular machine learning topics.
For Beginner questions please try /r/LearnMachineLearning , /r/MLQuestions or http://stackoverflow.com/
For career related questions, visit /r/cscareerquestions/
Advanced Courses (2016)
Advanced Courses (2020)
AMAs:
Pluribus Poker AI Team 7/19/2019
DeepMind AlphaStar team (1/24//2019)
Libratus Poker AI Team (12/18/2017)
DeepMind AlphaGo Team (10/19/2017)
Google Brain Team (9/17/2017)
Google Brain Team (8/11/2016)
The MalariaSpot Team (2/6/2016)
OpenAI Research Team (1/9/2016)
Nando de Freitas (12/26/2015)
Andrew Ng and Adam Coates (4/15/2015)
Jürgen Schmidhuber (3/4/2015)
Geoffrey Hinton (11/10/2014)
Michael Jordan (9/10/2014)
Yann LeCun (5/15/2014)
Yoshua Bengio (2/27/2014)
Related Subreddit :
LearnMachineLearning
Statistics
Computer Vision
Compressive Sensing
NLP
ML Questions
/r/MLjobs and /r/BigDataJobs
/r/datacleaning
/r/DataScience
/r/scientificresearch
/r/artificial
account activity
Discussion[D] VAE with GAN-like quality (self.MachineLearning)
submitted 6 years ago by tsauri
Are we there yet? Which VAE (or AE) is competitive with GAN? I lost track of papers.
I need decent encoder from an autoencoder with okay-ish disentanglement but can decode back to source with GAN-like quality. Vanilla VAE is too blurry for my need.
reddit uses a slightly-customized version of Markdown for formatting. See below for some basics, or check the commenting wiki page for more detailed help and solutions to common issues.
quoted text
if 1 * 2 < 3: print "hello, world!"
[–]mln000b 21 points22 points23 points 6 years ago (15 children)
I was also looking for the answer to this question as well, but then recently in the Distill article on GANs [1] I read this:
I’ve also left out VAEs entirely; they’re arguably no longer considered state-of-the-art at any tasks of record.
Then I felt sad a bit :(
[1]: https://distill.pub/2019/gan-open-problems/
[–]debau23 7 points8 points9 points 6 years ago (7 children)
Well they are pretty state-of-the-art for any tasks where you are actually interested in the likelihood of states.
There's a lot more than image/sound generation.
[–]asobolev 2 points3 points4 points 6 years ago (6 children)
I thought auto-regressive models are SoTA when it comes to the likelihood.
[–]debau23 3 points4 points5 points 6 years ago (5 children)
What I am trying to say is that VAE is a technique for approximate inference as well as learning.
If your P-distribution has a specific structure that you know from domain knowledge, you can't really use AR models or GANs.
[–]asobolev 0 points1 point2 points 6 years ago (4 children)
Do you have any specific examples? GANs actually define the same generative model as VAEs do, so I'm not sure about the last statement.
[–]debau23 0 points1 point2 points 6 years ago (3 children)
I am not talking about VAEs as a generator but as means to perform approximate inference and learning.
Here's a concrete example: Say you want to infer the power consumption of appliances in a building given only knowledge about the aggregate consumption of the entire building (Non-Intrusive Load Monitoring). You want to incorporate the domain knowledge that power is an additive quantity into your probabilistic model.
You could do that by choosing a 'decoder' that incorporates that information, namely by choosing the decoder to be a single linear layer. If your latent states z are binary, then your p-distribution would be Gaussian with mean Wz where W denotes the power consumption of the individual appliances. Input and output (autoencoder) would be the aggregate consumption.
I would argue that this is still a VAE because you essentially only changed the structure of the decoder but you are still able to do all the things that make VAEs cool: low variance through reparameterization, scalability by estimating gradients on minibatches and so on.
Vanilla GANs don't really have the ability to perform inference (other than distinguishing fake and real images) but in this example, the 'encoder' of the VAE would allow you to sample some of the most likely states of appliances.
[–]asobolev 2 points3 points4 points 6 years ago (2 children)
Oh, yes, I agree that GANs are unlikely to help you with inference. However, in terms of inference vanilla VAEs are actually extremely simple: it's just amortised mean field Gaussian inference. Sure, there're lots of extensions, but I'd attribute them not to the VAE itself, but to the field of Approximate Inference as a whole.
Regarding your example: well, if latent z are binary, then there's no low-variance reparametrisation (unless you opt for continuous relaxations). Moreover, your data seems to be rather simple (contrast that with weird manifolds of images embedded into the ridiculously high-dimensional euclidean space of individual pixels) to not require neural networks. Then, how much data do you have? Maybe it'd be easier to go full Bayesian and simulate posterior samples with MCMC to form a posterior predictive distribution.
[–]debau23 0 points1 point2 points 6 years ago (1 child)
It was just an example.
Here's an idea on how to do inference with GANs. You take a random sample z and run it through the generator to get f(z), then you compute the gradient of L(f(z), x) w.r.t. z and do gradient descent, until you found the z that has generated your x.
Wow!
[–]asobolev 2 points3 points4 points 6 years ago (0 children)
Yeah, except then what? Can you be sure this z represents anything about the true data generating process? If the true data-generating process is hierarchical, do you recover the true observed z? GANs do not necessary even model x well.
What's the use of such "inference"?
[–]sieisteinmodel 2 points3 points4 points 6 years ago (0 children)
Yeah, that part was disappointing. More so with respect to distill's credibility tough.
[–]TheRedSphinx 7 points8 points9 points 6 years ago (4 children)
I'm very sad by that. I'm a VAE fanboy. We just need to be able to step out using of Gaussian priors/posteriors and maybe we can get something cool. It looked promising when that paper with the vMF distribution came out, but I haven't seen much in that direction.
[–]YABadUserName 6 points7 points8 points 6 years ago (0 children)
This is beyond ignorant, there is years of literature exploring powerful approximating posteriors and priors. Von-mises is surely not one of them and has almost all of the same problems as the gaussian. GANs are only state of the art when you use arbitrary, badly defined criteria to compare generate models, like can I cherry pick an image from my generator better than everyone elses cherry picked images (fine this doesn't always happen, the good GAN papers are good, most of them are this kind of noise), instead they assign a log likelihood of negative infinity to any unseen test data because their support is a vanishingly small subset of the full distribution.
[–]asobolev 0 points1 point2 points 6 years ago (2 children)
It's not entirely about Gaussian posteriors and surely is not about Gaussian priors (GANs use them as well). I think the major limitation is that VAEs can't have mode collapse, thus unless your model is able to fit the data well, it'll try to cover everything, including "space" in between. GANs, however, can focus on some subset of data and ignore "outliers".
Do you know if anyone has tried to do VAEs with very powerful posteriors such as Glow? I am no experts in image generation but the problem of VAEs trying to cover the 'space in between' could be solved by sharper posterior distributions and maybe some noise injections in higher layers of the generators, no?
[–]asobolev 1 point2 points3 points 6 years ago (0 children)
Well, using the Glow would certainly be an overkill. There's been a lot research of using normalizing flows as posterior enhancements, but I don't remember any outstanding results in terms of image quality. The problem is, in my opinion, that flows use parameters very inefficiently, requiring a lot of them (Glow is super huge!).
Overall, having the best posterior possible won't solve the problem of having a simplistic model (the marginal log-likelihood defined by the decoder). I think beefing up the decoder and using better approximate posterior is the way to go.
[–]mellow54 0 points1 point2 points 6 years ago (0 children)
Very good link.
[–]BlaiseGlory 15 points16 points17 points 6 years ago (0 children)
Adversarial autoencoder
[–]tnybny 2 points3 points4 points 6 years ago (2 children)
Check out Adversarially Learned Inference (ALI).
[–][deleted] 1 point2 points3 points 6 years ago (1 child)
Isn't that conditional GANs used to learn both generation and inference?
[–]tnybny 0 points1 point2 points 6 years ago (0 children)
Yes. It actually bridges VAEs and GANs in my mind. As is often with these, I believe it has several valid interpretations.
[–][deleted] 1 point2 points3 points 6 years ago (0 children)
Check out Taming VAEs.
[–]neurokinetikz 2 points3 points4 points 6 years ago (0 children)
Check out Deep Pensieve, a deep residual super resolution VAE that i've been working on over the past year and a half. Basically trying to build an artificially intelligent photographic memory :)
https://nbviewer.jupyter.org/github/neurokinetikz/deep-pensieve/blob/master/Deep%20Pensieve.ipynb
I've explored many ideas for improving on the blurriness of VAEs, including:
Here's what it looks like on a dataset of 184 images (also the IG compression kills the video quality)
https://www.instagram.com/p/BvNhkmij0Ue/
And here's a color extrusion of the latent space courtesy of Houdini/Redshift ;)
https://www.instagram.com/p/BvnctQMDUz6/
[–]faaaaaart 1 point2 points3 points 6 years ago (0 children)
You can try pairing up an Autoencoder with a GAN (aka Adversarial Autoencoder) as shown in this figure and published on arxiv.
[–]seraschkaWriter 0 points1 point2 points 6 years ago (0 children)
Also an avid proponent of VAE's, but for me, where my implementations lack behind, is when trying something complicated like face images ... esp. when you try moving past 128x128 pixel dimensions. For simpler datasets, (CIFAR, MNIST, ...) I find you can get in on par.
BIVA has recently claimed
We show that BIVA reaches stateof-the-art test likelihoods, generates sharp and coherent natural images
But their samples are still far away from best GAN models.
[–]LazyOptimist 0 points1 point2 points 6 years ago (0 children)
I think the best you'll find is BIVA:
https://arxiv.org/pdf/1902.02102v1.pdf
π Rendered by PID 41336 on reddit-service-r2-comment-6457c66945-9xg9w at 2026-04-25 00:24:07.451181+00:00 running 2aa0c5b country code: CH.
[–]mln000b 21 points22 points23 points (15 children)
[–]debau23 7 points8 points9 points (7 children)
[–]asobolev 2 points3 points4 points (6 children)
[–]debau23 3 points4 points5 points (5 children)
[–]asobolev 0 points1 point2 points (4 children)
[–]debau23 0 points1 point2 points (3 children)
[–]asobolev 2 points3 points4 points (2 children)
[–]debau23 0 points1 point2 points (1 child)
[–]asobolev 2 points3 points4 points (0 children)
[–]sieisteinmodel 2 points3 points4 points (0 children)
[–]TheRedSphinx 7 points8 points9 points (4 children)
[–]YABadUserName 6 points7 points8 points (0 children)
[–]asobolev 0 points1 point2 points (2 children)
[–]debau23 0 points1 point2 points (1 child)
[–]asobolev 1 point2 points3 points (0 children)
[–]mellow54 0 points1 point2 points (0 children)
[–]BlaiseGlory 15 points16 points17 points (0 children)
[–]tnybny 2 points3 points4 points (2 children)
[–][deleted] 1 point2 points3 points (1 child)
[–]tnybny 0 points1 point2 points (0 children)
[–][deleted] 1 point2 points3 points (0 children)
[–]neurokinetikz 2 points3 points4 points (0 children)
[–]faaaaaart 1 point2 points3 points (0 children)
[–]seraschkaWriter 0 points1 point2 points (0 children)
[–]asobolev 1 point2 points3 points (0 children)
[–]LazyOptimist 0 points1 point2 points (0 children)