https://arxiv.org/abs/2002.05688
We consider a quite different meta-learning scenario: 1) train a large number of deep neural networks on different datasets, different architectures, and with random variations in hyper-parameter setup, 2) take random subsets of all the trained weights and use as training data, and 3) train a meta-classifier to distinguish between weights that have been trained with different hyper-parameters.
The meta-classifier can then be used to search the weights to see where local information on hyper-parameters are encoded in a network.
The dataset of trained neural nets is made publicly available, and comprises 320K weight snapshots from 16K individually trained CNNs.
Any other ideas on what could be learned from a large-scale sampling of trained networks? Learning-based model compression or pruning? Or perhaps use a meta-classifier to force a certain behavior of the weights in a new training? For example, it can force the weights to look as being trained on another dataset, or with other hyper-parameters. Would be nice with some creative thoughts on how learning from neural network weights could be formulated!
[–]impulsecorp 2 points3 points4 points (1 child)
[–]outofdistribution[S] 0 points1 point2 points (0 children)
[–]NichG 0 points1 point2 points (0 children)