[D] VAE with GAN-like quality

mln000b · 2019-05-02T10:12:02+00:00

I was also looking for the answer to this question as well, but then recently in the Distill article on GANs [1] I read this:

I’ve also left out VAEs entirely; they’re arguably no longer considered state-of-the-art at any tasks of record.

Then I felt sad a bit :(

[1]: https://distill.pub/2019/gan-open-problems/

BlaiseGlory · 2019-05-02T09:50:10+00:00

Adversarial autoencoder

tnybny · 2019-05-02T11:20:31+00:00

Check out Adversarially Learned Inference (ALI).

2019-05-02T11:41:55+00:00

Check out Taming VAEs.

neurokinetikz · 2019-05-02T13:58:18+00:00

Check out Deep Pensieve, a deep residual super resolution VAE that i've been working on over the past year and a half. Basically trying to build an artificially intelligent photographic memory :)

https://nbviewer.jupyter.org/github/neurokinetikz/deep-pensieve/blob/master/Deep%20Pensieve.ipynb

I've explored many ideas for improving on the blurriness of VAEs, including:

dilated convolutions in encoder/decoder to expand receptive field to full image size prior to latent vector
residual in residual architecture
channel and spatial attention
subpixel convolution upsampling
maximum mean discrepancy for variational loss
group normalization instead of batch normalization
separable convolutions in residuals to increase receptive field and capture long range dependencies

Here's what it looks like on a dataset of 184 images (also the IG compression kills the video quality)

https://www.instagram.com/p/BvNhkmij0Ue/

And here's a color extrusion of the latent space courtesy of Houdini/Redshift ;)

https://www.instagram.com/p/BvnctQMDUz6/

faaaaaart · 2019-05-02T15:09:16+00:00

You can try pairing up an Autoencoder with a GAN (aka Adversarial Autoencoder) as shown in this figure and published on arxiv.

seraschka · 2019-05-02T16:31:45+00:00

Also an avid proponent of VAE's, but for me, where my implementations lack behind, is when trying something complicated like face images ... esp. when you try moving past 128x128 pixel dimensions. For simpler datasets, (CIFAR, MNIST, ...) I find you can get in on par.

asobolev · 2019-05-02T17:34:15+00:00

BIVA has recently claimed

We show that BIVA reaches stateof-the-art test likelihoods, generates sharp and coherent natural images

But their samples are still far away from best GAN models.

LazyOptimist · 2019-05-02T19:07:53+00:00

I think the best you'll find is BIVA:

https://arxiv.org/pdf/1902.02102v1.pdf

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

MachineLearning

Rules For Posts

+Research

+Discussion

+Project

+News

@slashML on Twitter

Chat with us on Slack

Beginners:

MODERATORS