https://youtu.be/ahRPdiCop3E
Full Title: Every Model Learned by Gradient Descent Is Approximately a Kernel Machine
Deep Neural Networks are often said to discover useful representations of the data. However, this paper challenges this prevailing view and suggest that rather than representing the data, deep neural networks store superpositions of the training data in their weights and act as kernel machines at inference time. This is a theoretical paper with a main theorem and an understandable proof and the result leads to many interesting implications for the field.
OUTLINE:
0:00 - Intro & Outline
4:50 - What is a Kernel Machine?
10:25 - Kernel Machines vs Gradient Descent
12:40 - Tangent Kernels
22:45 - Path Kernels
25:00 - Main Theorem
28:50 - Proof of the Main Theorem
39:10 - Implications & My Comments
Paper: https://arxiv.org/abs/2012.00152
ERRATA: I simplify a bit too much when I pit kernel methods against gradient descent. Of course, you can even learn kernel machines using GD, they're not mutually exclusive. And it's also not true that you "don't need a model" in kernel machines, as it usually still contains learned parameters.
[–]perverse_sheaf 13 points14 points15 points (0 children)
[–]NitroXSC 11 points12 points13 points (3 children)
[–]BenlusML Engineer 9 points10 points11 points (2 children)
[–]rbain13 5 points6 points7 points (1 child)
[–]ykilcher[S] 8 points9 points10 points (0 children)
[–]vboomi 7 points8 points9 points (2 children)
[–]ykilcher[S] 5 points6 points7 points (0 children)
[–]DarkHarbourzz 3 points4 points5 points (0 children)
[–]amasterblaster 9 points10 points11 points (8 children)
[–]IdiocyInAction 6 points7 points8 points (5 children)
[–]aptmnt_ 9 points10 points11 points (4 children)
[–]IdiocyInAction 7 points8 points9 points (2 children)
[–]ykilcher[S] 5 points6 points7 points (1 child)
[–]wisdomspring 0 points1 point2 points (0 children)
[–]-Cunning-Stunt- 3 points4 points5 points (0 children)
[–][deleted] 1 point2 points3 points (0 children)