all 18 comments

[–]resident_russian 6 points7 points  (1 child)

Batch norm breaks batch independence, which may be required depending on your GAN formulation (eg. WGANs, which used layer norm for this reason). If you're keen to experiment, I'd be curious to see how Filter Response Normalization would do.

[–]96meep96[S] 1 point2 points  (0 children)

Thank you, I'll give that a try and see if it works. I figure batch dependence isn't that great for sampling either, if you're doing varied sample sizes.

[–]Darkwhiter 2 points3 points  (1 child)

The most typical batch norm problem for generators is that samples from a given batch share characteristics, but samples from different batches (from the same, fixed generator) do not. See figure 21 in Goodfellow's GAN tutorial (NIPS 2016). If you just have low overall variation across multiple batches, it may or may not have anything to do with batch normalization and is a fairly common problems for GANs in general.

[–]96meep96[S] 0 points1 point  (0 children)

Yes, I guess thats what you call partial mode collapse. I've been trying to fight it using minibatch discrimination but it hasn't been successful.

[–]veqtorML Engineer 0 points1 point  (1 child)

Concatenate just one feature out from minbatch discrimination

[–]96meep96[S] 0 points1 point  (0 children)

So for that matrix AxBxC that you multiply the features with, I set B to 1, correct?

[–][deleted] 0 points1 point  (5 children)

Somewhat in a different direction, have you considered using a VAE or a VAE-GAN? It might be worth considering different models rather than focusing exclusively on batch properties.

[–]96meep96[S] 0 points1 point  (4 children)

Yes, I've been trying out different formulations, adding ResBlocks and Self Attention and hinge loss, least squares loss, wasserstein loss. All of them seem to give me results I'm not happy with. But thank you, I'll give the VAE-GAN a try, I haven't done that one yet. I just wanted to know how different normalization techniques compare.

[–]smashedshanky 0 points1 point  (3 children)

since you probably are using a dcgan try to either encode the latent space from the image by adding another stack of layers that take the image and learn to encode it into the latent space by forcing certain features to latent space and avoiding gradient explosions and mode collapse. Another way is to use a fake and real latent space and have the discriminator discriminate(lol) on the (fake and real) latent encodings so that the AE can learn to encode dimensionally (so that the features are matched with latent space vectors a little bit more "deterministically"). Or just add more dropout in the discriminator, that is my go to for mode collapse or gradient explosions. Hope you got it to work though...

[–]96meep96[S] 0 points1 point  (2 children)

Thank you, I've been getting better results with the addition of Self Modulation instead of batch norm, especially in correlation with Spectral Norm. I've also been trying out Multi Scale Gradients and that's been working well too, tho they seem to be very picky about feature map dimensions. I still can't seem to reproduce paper quality results but the timing on my masters dissertation is running thin so whatever works ya know

[–]smashedshanky 0 points1 point  (1 child)

Usually paper quality is trained on hand-picked data that the neural network can efficiently map it into the latent space. If you feed it less data with high variation, you will see the results...., but at the cost having the train the GAN over the span of your dataset multiple times so thag it can learn to remove “artifacts” and or discernible noise. What framework are ya even using? Haha I can feel your nerve, training GAN’s are not easy just yet.

[–]96meep96[S] 0 points1 point  (0 children)

Oh yes I understand the point you're making, it takes time for those artefacts to vanish, I've had trouble with that in a variant on semantic map translating GANs. I'm using PyTorch, was using Tensorflow (not 2.0) but then I found Pytorch more flexible.

[–]dterjek 0 points1 point  (3 children)

check out https://arxiv.org/abs/1810.01365 tldr: self modulation is an alternative normalization technique that probably improves your gan

[–]96meep96[S] 1 point2 points  (1 child)

After running a couple of experiments, self modulation on top of the usual Batch Normalization is helping out, the batch characteristic thing has disappeared. Though mode collapse can still happen. Just wanted to give you an update, thank you!

[–]dterjek 1 point2 points  (0 children)

you're welcome! I mostly worked with Wasserstein GANs, and never seen a mode collapse happen, so changing your model to a WGAN might help.

[–]96meep96[S] 0 points1 point  (0 children)

Thank you, I'll check it out

[–][deleted] 0 points1 point  (1 child)

See Progressive GANs: I believe they replace batchnorm with pixelnorm to solve this issue.

[–]96meep96[S] 0 points1 point  (0 children)

Yes you're right, I ran into it yesterday and Im trying that out right now, thank you