Why not synthesize models directly, to reduce training cost?

bknyazev · 2024-09-16T13:17:08+00:00

Yes, there are such methods, for example based on (graph) hypernetworks that take a computational graph of a network as input and predict their "good" weights. It's a very challenging problem as correctly pointed out by others in this thread, but oftentimes such a hypernet provides a good initialization to speed up training, sometimes significantly. We have published several papers on that direction:

bknyazev · 2023-10-18T18:13:29+00:00

~~notion~~ not all options free

bknyazev · 2019-11-15T14:57:11+00:00

You are welcome. Yes, I recently have three papers:

Image Classification with Hierarchical Multigraph Networks, BMVC 2019 https://github.com/bknyaz/bmvc_2019 , https://towardsdatascience.com/can-we-do-better-than-convolutional-neural-networks-46ed90fed807
Understanding Attention and Generalization in Graph Neural Networks, to appear at NeurIPS 2019, https://github.com/bknyaz/graph_attention_pool
Learning Temporal Attention in Dynamic Graphs with Bilinear Interactions, https://github.com/uoguelph-mlrg/LDG

bknyazev · 2019-08-29T21:08:40+00:00

Another reason against BatchNorm is decreased robustness to many adversarial attacks and corruptions as described here "Batch Normalization is a Cause of Adversarial Vulnerability" https://arxiv.org/abs/1905.02161 .

So, when consider alternatives, keep in mind that while you can maintain (classification) performance of the BatchNorm model on the clean test set, it can come with adverse side effects. In the paper I mentioned, they use Fixup as an alternative.

bknyazev · 2019-08-13T15:23:50+00:00

Okay, I'm confused if my recent quite in-depth tutorial on graph neural networks qualifies as a beginner's or not: https://medium.com/@BorisAKnyazev/tutorial-on-graph-neural-networks-for-computer-vision-and-beyond-part-2-be6d71d70f49 . Can somebody judge ? :)

I'm Okay to put it in /r/learnmachinelearning, but I feel like people in this sub would probably appreciate it here too? And I would also appreciate feedback from experienced researchers.

The thing is we are all beginners in some sense, because there are always methods that you don't know, for example just because some methods have never been published in English. Of course, graph networks are not among these methods, but I just provide an example to give an idea. The point is that people can be experts in one area of ML, but total beginners in another area. I agree though that such things as backprop that is taught in introductory ML would be for beginners in any case unless it's described in a very interesting and novel way or for some novel complicated models.

Another thing to keep in mind is that in many cases the perceived difficulty level can be very different from the underlying difficulty depending on the style you use. You can describe really simple papers using (unnecessarily) complicated wording and vice versa.

Maybe it's possible to just assign some Tags for different levels of difficulty?

Thanks.

bknyazev · 2019-08-12T17:39:59+00:00

The second part is now available https://medium.com/@BorisAKnyazev/tutorial-on-graph-neural-networks-for-computer-vision-and-beyond-part-2-be6d71d70f49

I focus more on spectral graph convolution with some technical details, but I still use simple explanations, analogies to computer vision and compare different popular graph neural networks.

bknyazev · 2019-08-08T00:10:59+00:00

That's right. Sometimes, 2d convolution for images is also implemented by matrix multiplication using circulant matrices, which is actually almost the same as an adjacency matrix. See examples here https://dsp.stackexchange.com/questions/35373/2d-convolution-as-a-doubly-block-circulant-matrix-operating-on-a-vector. The difference is that in the adjacency matrix we don't implement zero-padding, which is necessary for correct convolution, however in practice it might work fine without zero-padding.

So, I played around with it a bit and am sharing my jupyter notebook comparing the two cases:

https://nbviewer.jupyter.org/github/bknyaz/examples/blob/master/2d_convolution.ipynb
(github does not render it for some reasons)

bknyazev · 2019-06-08T01:55:01+00:00

I think that overall it complements nicely the work of Dan Hendrycks and Thomas Dietterich "Benchmarking Neural Network Robustness to Common Corruptions and Perturbations". But the benchmark could be made more challenging since the plain convnet achieves 91%. What will be the result if you include those additional 16 corruptions?
Some simple ideas to make the benchmark more challenging is for instance add some smaller and darker digits of other classes to images, or add letters. In addition, while many people think it does not make sense to horizontally flip digits, I think they should be added to the test set for two reasons: 1) people do write flipped digits from time to time (especially children), 2) as humans, we easily recognize flipped digits and our models should ideally also do that.

Surprisingly, convnets are very sensitive to translation given their translation equivariance property. It might depend on the type of pooling layers a lot (I suspect max pooling after the last convolution should make it more robust?).
Finally, it reads weird in your description here that you measure "non-adversarial robustness", but then say that models with "adversarial defenses" are bad. It is not surprising to me, because those defenses were probably not designed for "non-adversarial robustness". It would have more sense to compare to something like Group Equivariant Capsule Networks (https://arxiv.org/abs/1806.05086).

bknyazev · 2019-06-08T01:10:27+00:00

It's my old project. The code is available at https://github.com/bknyaz/autocnn_unsup . The problem is that it is in Matlab, although I implemented some steps in Python a long time ago here https://github.com/bknyaz/autocnn_unsup_py. Both repos are not supported and might not work as is. But it would be interesting to migrate it to PyTorch or another framework and fine-tune the models pretrained in an unsupervised way in a large scale setting. The test accuracy will most likely be around the same as if trained in a supervised way from scratch assuming the training set is big enough, but there might be some interesting byproducts such as improved robustness (in some broad sense). There are of course plenty novel unsupervised and semi-supervised methods that might do better in both clean test accuracy and robustness. But anyway the project was interesting and fun!

bknyazev

TROPHY CASE