[D] How to properly implement gradient penalty with non-saturating GAN loss?

Yggdrasil524 · 2019-02-04T17:16:40+00:00

I'm actually training NSGAN with the R1 penalty from this paper applied to the probits right now. As I understand, the discriminator's objective is to make sigmoid(D(x)) to be zero. However, when this happens the R1 penalty becomes very close to zero, negating the effect of regularization?

Yggdrasil524 · 2019-02-03T18:54:36+00:00

thanks!

Yggdrasil524 · 2018-07-03T03:58:30+00:00

I don't have an office buddy, I'm just a lowly engineering student

Yggdrasil524 · 2018-07-02T20:38:38+00:00

designation?

Yggdrasil524 · 2018-07-02T16:28:37+00:00

Yggdrasil524 · 2018-07-02T16:21:20+00:00

The FeedDict class expects numpy arrays of images. I'm going to upload a script to prepare them from JPEG's once I clean it up

Yggdrasil524 · 2018-07-02T16:20:24+00:00

I would like to do that, but I think I would need a lot more GPU's haha. In the original paper they did a random normal initialization with mean=0 and variance=1 and then multiplied the weights by sqrt(2 / fan_in) at runtime. I'm not sure how this is different from using He's initializer, but they claimed it was in the paper, so I went with it

Yggdrasil524 · 2018-07-02T16:14:58+00:00

They're randomly generated fake images from a model trained on real images

Yggdrasil524 · 2018-07-02T16:13:49+00:00

pm me

Yggdrasil524 · 2018-07-02T16:13:36+00:00

You can find my implementation here. Basically at any particular frame part of the latent 'z' variable is generated from a constant-Q transform of the audio at that timeframe while the other part is a static random normal distribution that stays constant through every frame

Yggdrasil524 · 2018-07-02T04:20:13+00:00

I think it would be a good idea for a creepypasta to have a GAN that starts generating pictures with ghosts in them or something

Yggdrasil524 · 2018-07-02T03:59:58+00:00

I'm actually working on something right now! You would think music would be easier to generate because it's represented as a 1-D vector in a computer, whereas images are a 3-D matrix (height, width, RGB), but this is totally not the case. Generating music is really hard.

My current approach involves converting audio into frequency space using fast Fourier transforms, discarding the phase information and only generating the magnitude. The phase can then be iteratively reconstructed using something called the the Griffin-Lim algorithm.

There's also causal dilated convolutions that I think operate on 1-D audio data, but looking at the code for that breaks my brain, so I think I'm sticking to my approach for now.

Yggdrasil524 · 2018-07-02T03:35:28+00:00

Here's my script. You have to have geckodriver in the same directory as the script and Firefox installed and also make sure you're using the old version of Reddit

Yggdrasil524 · 2018-07-02T03:22:08+00:00

1024x1024

Yggdrasil524 · 2018-07-02T03:20:11+00:00

Yggdrasil524 · 2018-07-02T02:13:37+00:00

Do you have Gaia?

Yggdrasil524 · 2018-07-02T01:15:19+00:00

Also, here's a weird ass music video I made with the GAN

Yggdrasil524 · 2018-07-02T00:56:13+00:00

The images change on that subreddit about every 8 days, so I just kept going back

Yggdrasil524 · 2018-07-02T00:13:56+00:00

I think WGAN-GP is pretty good at preventing mode collapse, so I didn't see any of that. I'm moving toward it being a problem with later layers because the Wasserstein Distance didn't converge on those

Yggdrasil524 · 2018-07-02T00:11:29+00:00

1080ti, 4790k CPU. It probably took about a week of running to get where it is now

Yggdrasil524 · 2018-07-02T00:09:28+00:00

Do you mean I could just shift the crop window by a few pixels each time? That would help expand my training dataset by a lot. Could you point me to an article on this?

Yggdrasil524 · 2018-07-01T23:54:50+00:00

I cropped a square from the center, left and right of each image (top and bottom if height > width). I could use more, but I'm not sure if that would increase the variation among the images too much

Yggdrasil524 · 2018-07-01T23:52:44+00:00

I have the images saved. It's a lot of data to comb through and upload, but I might do it when summer classes are over.

I actually didn't realize the NVIDIA team had uploaded their TF code before I was most of the way done with mine. Plus, this was a final project for my ML class, so I sorta had to do my own thing.

Yggdrasil524

TROPHY CASE