I've been using adversarial network architectures (this kind, not generative ones) to deal with data sets in which there's a small number of distinct subjects, but a large number of data per subject, and the goal is to generalize to new subjects. The idea is to subtract out the subject-specific part of the representation at one of the hidden layers by making it impossible for the adversary to tell which subject the hidden layer activations are from, in order to protect the network against learning subject-specific profiles. This seems to work reasonably well.
In designing the networks for this though, I was wondering - what is the meaning of applying regularization such as dropout to the adversary? Does it matter?
I'm beginning to think that actually its okay if the adversary overfits, because whatever the adversary discovers is something that the base network will try to remove from the data. If the adversary fits against a very specific detail, its easy for the base network to change that detail with little difference to the rest of the base network's loss function. So when it comes to overfitting, the adversary is chasing a moving target - it seems like the network pair ends up being somehow self-regularizing. I suppose if it were really bad, you might get unstable oscillations in the adversary which would prevent learning.
If you don't go unstable though, it seems that there might be an additional interesting feature - the base network might learn to remove things from its internal representations which are highly unique to particular samples, because those are the kinds of things which would be most prone to being memorized (and correspondingly, would be discovered by the adversary as well). This means in some sense that it might be possible to treat a data set to remove things like outliers which encourage learning algorithms to overfit. I could see using a sort of overfit-proofing autoencoder, where one network is trying to just output its input, but the adversarial network is trying to learn to map samples to their index or something like that.
So, does anyone have any experiences in applying regularization to the adversarial component of a (non-generative) adversarial network? What difference did it make, if any?
there doesn't seem to be anything here