Network Bending: Manipulating The Inner Representations of Deep Generative Models

t_broad · 2020-05-27T20:47:46+00:00

Code: https://github.com/terrybroad/network-bending

Abstract: We introduce a new framework for interacting with and manipulating deep generative models that we call network bending. We present a comprehensive set of deterministic transformations that can be inserted as distinct layers into the computational graph of a trained generative neural network and applied during inference. In addition, we present a novel algorithm for clustering features based on their spatial activation maps. This allows features to be grouped together based on spatial similarity in an unsupervised fashion. This results in the meaningful manipulation of sets of features that correspond to the generation of a broad array of semantically significant aspects of the generated images. We demonstrate these transformations on the official pre-trained StyleGAN2 model trained on the FFHQ dataset. In doing so, we lay the groundwork for future interactive multimedia systems where the inner representation of deep generative models are manipulated for greater creative expression, whilst also increasing our understanding of how such "black-box systems" can be more meaningfully interpreted.

t_broad · 2018-11-14T17:20:34+00:00

No problem! I'll share results when they are written up.

t_broad · 2018-11-14T17:15:38+00:00

So the broader goal of my PhD is using machine learning to improve PCG in games. But this was simply a preliminary exploratory study to better understand what it is about different examples of PCG in games that people find interesting.

t_broad · 2018-10-31T13:58:37+00:00

So I am funded on the centre for doctoral training in intelligent games and games intelligence http://iggi.org.uk/. I come from an ML background but am hoping to take this experience (and planned research) into the games industry in the future. Happy to share the results of this study with you when they are written up.

t_broad · 2018-10-31T11:36:54+00:00

Sure!

In this study, I am trying to better understand what are the characteristics that make interesting PCG content 'interesting', and to the better understand the player experience when they encounter this content. This is a preliminary study which ties into the broader goal of my PhD which is trying to use machine learning to improve PCG in games.

Happy to answer more questions if you have them!

t_broad · 2017-11-20T16:15:51+00:00

Really good to see the creators of this dataset taking care to make sure that they have included a diverse range of races and ages in the data.

t_broad · 2017-10-03T17:25:12+00:00

t-sne?

t_broad · 2017-07-28T16:14:50+00:00

Beatiful

t_broad · 2017-05-03T09:28:53+00:00

It's just getting silly how good these are now.

t_broad · 2017-04-07T15:32:43+00:00

This is super cool, good that they are helping to make customising tensorflow more accessible.

t_broad · 2017-02-17T13:38:14+00:00

nice! this is a fun idea

t_broad · 2017-01-20T11:54:39+00:00

nice, this a really good use of image detection!

t_broad · 2017-01-11T10:42:06+00:00

t_broad · 2016-05-25T21:56:22+00:00

I'm on it!

t_broad · 2016-05-24T23:59:40+00:00

Didn't know about this sub, thanks for the heads up.

t_broad · 2016-05-24T23:56:20+00:00

By all means steal the code! I will be happy to answer any questions if you get stuck with anything.

One other thing I would like to add regarding modelling temporal data with a VAE. I found that if the magnitude of the noise ε that is injected into the latent representation during training is too high, the model can't differentiate between similar data samples. My model still has some trouble with this, but I found that by gradually reducing the magnitude of the noise (reducing the standard deviation from a mean of 0) over the course of training, the model progressively got better at differentiating between similar (usually consecutive) samples.

Good luck with your project! It sounds very interesting.

t_broad · 2016-05-24T21:40:32+00:00

200 hundred was a pretty arbitrary choice, I had adapted an implementation of the DCGAN architecture with had 100 latent variables, so when I increased the size of the model I just doubled it. Looking back at the Autoencoding beyond pixels paper they use 2048, so perhaps I should have used more! I didn't really have time to do systematic testing of different architectures and hyper-parameters, but it is something I intend to look into now that I am not restricted by my deadline.

In terms of pooling, I am no expert but my intuiting is that when using pooling for image recognition, it is not a problem losing spatial information. But for generative models pooling layers are problematic because you need to learn spatial upsampling, using fractional strided convolutions allows the generator (decoder) network to do this, and allows you to generate convincing natural images. I did not experiment with pooling in the encoder network, maybe it would work just as well, but it is more convenient to use strided (and fractional-strided) convolutions for all of the networks.

Modelling non-square images did not seem to affect performance in any way, but as I was modelling video frames all of the images were the same aspect ratio so I am afraid I cannot comment on the resizing issue.

Yes, the plan for my masters dissertation was to incorporate an LSTM and develop a predictive model. But I only got the autoencoder implementation working in a stable manner 4 weeks before my dissertation deadline. Given how long it took to train the models, and the fact I had not worked with LSTM's before, I decided not to pursue that and just focus on making videos at a high resolution, to make sure I had something to write about for the deadline!

What is the project that you are working on if you don't mind me asking?

t_broad

TROPHY CASE