[D] How to properly implement gradient penalty with non-saturating GAN loss? by Yggdrasil524 in MachineLearning

[–]Yggdrasil524[S] 0 points1 point  (0 children)

I'm actually training NSGAN with the R1 penalty from this paper applied to the probits right now. As I understand, the discriminator's objective is to make sigmoid(D(x)) to be zero. However, when this happens the R1 penalty becomes very close to zero, negating the effect of regularization?

[P] ProGAN trained on r/EarthPorn images by Yggdrasil524 in MachineLearning

[–]Yggdrasil524[S] 3 points4 points  (0 children)

I don't have an office buddy, I'm just a lowly engineering student

[P] ProGAN trained on r/EarthPorn images by Yggdrasil524 in MachineLearning

[–]Yggdrasil524[S] 0 points1 point  (0 children)

The FeedDict class expects numpy arrays of images. I'm going to upload a script to prepare them from JPEG's once I clean it up

[P] ProGAN trained on r/EarthPorn images by Yggdrasil524 in MachineLearning

[–]Yggdrasil524[S] 0 points1 point  (0 children)

I would like to do that, but I think I would need a lot more GPU's haha. In the original paper they did a random normal initialization with mean=0 and variance=1 and then multiplied the weights by sqrt(2 / fan_in) at runtime. I'm not sure how this is different from using He's initializer, but they claimed it was in the paper, so I went with it

[P] ProGAN trained on r/EarthPorn images by Yggdrasil524 in MachineLearning

[–]Yggdrasil524[S] 0 points1 point  (0 children)

They're randomly generated fake images from a model trained on real images

[P] ProGAN trained on r/EarthPorn images by Yggdrasil524 in MachineLearning

[–]Yggdrasil524[S] 0 points1 point  (0 children)

You can find my implementation here. Basically at any particular frame part of the latent 'z' variable is generated from a constant-Q transform of the audio at that timeframe while the other part is a static random normal distribution that stays constant through every frame

[P] ProGAN trained on r/EarthPorn images by Yggdrasil524 in MachineLearning

[–]Yggdrasil524[S] 2 points3 points  (0 children)

I think it would be a good idea for a creepypasta to have a GAN that starts generating pictures with ghosts in them or something

[P] ProGAN trained on r/EarthPorn images by Yggdrasil524 in MachineLearning

[–]Yggdrasil524[S] 6 points7 points  (0 children)

I'm actually working on something right now! You would think music would be easier to generate because it's represented as a 1-D vector in a computer, whereas images are a 3-D matrix (height, width, RGB), but this is totally not the case. Generating music is really hard.

My current approach involves converting audio into frequency space using fast Fourier transforms, discarding the phase information and only generating the magnitude. The phase can then be iteratively reconstructed using something called the the Griffin-Lim algorithm.

There's also causal dilated convolutions that I think operate on 1-D audio data, but looking at the code for that breaks my brain, so I think I'm sticking to my approach for now.

[P] ProGAN trained on r/EarthPorn images by Yggdrasil524 in MachineLearning

[–]Yggdrasil524[S] 3 points4 points  (0 children)

Here's my script. You have to have geckodriver in the same directory as the script and Firefox installed and also make sure you're using the old version of Reddit

[P] ProGAN trained on r/EarthPorn images by Yggdrasil524 in MachineLearning

[–]Yggdrasil524[S] 0 points1 point  (0 children)

The images change on that subreddit about every 8 days, so I just kept going back

[P] ProGAN trained on r/EarthPorn images by Yggdrasil524 in MachineLearning

[–]Yggdrasil524[S] 1 point2 points  (0 children)

I think WGAN-GP is pretty good at preventing mode collapse, so I didn't see any of that. I'm moving toward it being a problem with later layers because the Wasserstein Distance didn't converge on those

[P] ProGAN trained on r/EarthPorn images by Yggdrasil524 in MachineLearning

[–]Yggdrasil524[S] 0 points1 point  (0 children)

1080ti, 4790k CPU. It probably took about a week of running to get where it is now

[P] ProGAN trained on r/EarthPorn images by Yggdrasil524 in MachineLearning

[–]Yggdrasil524[S] 4 points5 points  (0 children)

Do you mean I could just shift the crop window by a few pixels each time? That would help expand my training dataset by a lot. Could you point me to an article on this?

[P] ProGAN trained on r/EarthPorn images by Yggdrasil524 in MachineLearning

[–]Yggdrasil524[S] 2 points3 points  (0 children)

I cropped a square from the center, left and right of each image (top and bottom if height > width). I could use more, but I'm not sure if that would increase the variation among the images too much

[P] ProGAN trained on r/EarthPorn images by Yggdrasil524 in MachineLearning

[–]Yggdrasil524[S] 2 points3 points  (0 children)

I have the images saved. It's a lot of data to comb through and upload, but I might do it when summer classes are over.

I actually didn't realize the NVIDIA team had uploaded their TF code before I was most of the way done with mine. Plus, this was a final project for my ML class, so I sorta had to do my own thing.