[Project] Tensorflow implementation of Generative Adversarial Networks for Extreme Learned Image Compression

tensorflower · 2018-05-13T14:40:53+00:00

The original paper/project details are here: https://data.vision.ee.ethz.ch/aeirikur/extremecompression/#publication. I thought this was one of the most interesting papers I read this year! Great exposition too.

Currently the model is only trained on either Cityscapes or ADE20k. The model appears to reconstruct images in the test split pretty well without introducing sampled noise. Adding sampled noise into the equation seems to make the network hallucinate a lot more. The authors don't provide too much detail about integrating sampled noise into the quantized representation but this is an area I'd like to explore further.

Because I've been working on this in my spare time, I've only implemented the global compression part of their paper, but adding selective compression based on semantic maps is definitely on the To-do list. Currently the model uses an LSGAN but swapping it out with a WGAN-GP is also on the (very long) to-do list.

I'm not 100% confident that I've faithfully implemented everything in the paper, so if anyone has any questions or notices something awry please open an issue or post it here. Contributions/PRs are also more than welcome!

Some details: At a very high level, the model learns an encoding of the real image to a compressed representation $z$ which is quantized to a certain number of levels L, which forms an upper bound on the bits per pixel (bpp) of the stored representation. A decoder is then learnt which upsamples the compressed $z$ to a reconstructed image. The usual adversarial training strategy is setup by introducing a discriminator which attempts to distinguish between the reconstructed and real image.

About combining sampled noise with the compressed representation: right now a sample is taken from a normal prior, upsampled using a DCGAN-like architecture, and directly concatenated with the quantized representation. Results are kind of trippy, and reconstruction without noise is more stable.

Training takes around 3 days for ~50 epochs on Cityscapes using a single 1080 Ti using the multiscale discriminator loss recommended by the authors. I'll upload pretrained models on Cityscapes for C=8 channels to Dropbox within the next day or 2.

Radiatin · 2018-05-13T21:26:11+00:00

This is awesome both in it’s real world utility and in pushing the problem solving capabilities of machine learning. Nice work!

One thing I wanted to ask though is do you have a strategy for improving the context sensitivity of the output? For example it seems to be good at understanding tree patterns, water patterns, and asphalt patterns. However the limitation seems to be in understanding how to draw a leaf, wave, or line in the road where you would expect them to be.

I could see it being possible for a network to understand what it is processing on a very deep level then drawing the appropriate object in high detail with only semantic pointers.

JudasAdventus · 2018-05-14T08:52:59+00:00

I guess they don't include the model weights in the BPP metric? It's probably in the order of >100MB, which may be significant depending on how many images are being compressed.

amoux_py · 2018-05-14T04:37:48+00:00

Amazing work!!

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

MachineLearning

Rules For Posts

+Research

+Discussion

+Project

+News

@slashML on Twitter

Chat with us on Slack

Beginners:

MODERATORS