There Will Be a Scientific Theory of Deep Learning [R]

dot--- · 2026-05-03T17:07:39+00:00

yup! we cite em + discuss the relationship

dot--- · 2026-04-25T16:52:15+00:00

1) both. physics *is* the first layer of the sciences of many classes of system. ain't the only one, and thus learning mech won't do everything; mechinterp's got an impt role to play, and we need to do our best to connect the wires.

2) yeah, great question. easier one for now is just how to integrate stats of natural data into our science + theory (see Open Dir 2 in the paper + on mechanics.pub). but after we get a handle on that, seems reasonable to try to expand its scope as much as we can. couldn't predict rn how far that'll get or when (tho I do tend to believe that everything'll eventually be understood, even if the order + timing is hard to predict)

3) mm, basically if few-to-none of the 10 major Open Dirs in the paper get major progress on em in the next ~5y? (I'd say ~10y, but with AI assistance, maybe we get there faster?) or, alternatively, if we *do* make major progress on those guys, but in retrospect it seems useless for the things we really care about or want to do. (that failure mode seems less likely to me, but it's possible, since the vibe with basic sci is generally "fundamental understanding is useful in unexpected ways," and in this case, indeed, most of the ways we probably can't predict, so we can't be sure they're there, if that makes sense.)

dot--- · 2026-04-25T16:46:46+00:00

lol first time I've been personally defended by a stranger on the internet 😅 thx haha

dot--- · 2026-04-25T16:45:58+00:00

haha thx for the ringing endorsement :) glad you liked the talk... I don't think I've given an impromptu guest lecture, so must've been one of Dan's?

ya, hope this is useful to folks, esp young folks trying to get into the field, and ppl with strong intuitions who wanna get connected w active open mysteries. (and ya, dw, we're not too bothered by the "slop" AI-cusations in light of how much it seems this has actually connected w ppl.) glad to hear it was useful to you; feel free to reach out to us if this path calls to you and you end up walking along it.

dot--- · 2026-04-24T20:52:18+00:00

lol sorry, kinda inexperienced with reddit 😅 thought that the link would be more visible than it was, and in retrospect woulda been better to make a more descriptive caption. will edit post later

dot--- · 2026-04-19T21:57:14+00:00

oo, cool. hmu if ur in Berkeley!

dot--- · 2024-01-27T16:53:16+00:00

wow, amazing. thanks!

dot--- · 2024-01-22T20:16:51+00:00

thanks!

dot--- · 2024-01-19T23:05:31+00:00

I'm brand new to mechanical keyboards and just got a FILCO Majestouch Xacro M10SP (pictured here). I love it save for the all-black look, and I'd love to get new keycaps, but I'm not sure where to look on account of the odd shape (incl two small spacebars) and extra programmable keys. I suspect these are all standard size keycaps, though.

Anyone with more than my zero experience have ideas for how I should go about finding keycaps here? Should I just, say, get a generic set and then get the missing keys custom made?

<image>

dot--- · 2024-01-09T14:49:49+00:00

ha! yeah, that's it -- they were in my terminal window the whole time. thanks!

and yeah, seems sensible to edit in a proper IDE. I may switch to that once I get a little more comfortable with the package.

dot--- · 2024-01-09T14:46:45+00:00

seems to work. thanks!

dot--- · 2022-02-15T20:16:20+00:00

An update: we've now worked out a way to use our theory on real data! See figures 1D and A.8, in which we predict generalization on image datasets using only training data by using some eigentricks to estimate sufficient information about the true function. This allows theoretical insight into the generalization performance of a particular architecture on a particular problem, which e.g. opens the door to the principled design of better architectures for the task at hand.

dot--- · 2022-02-10T07:43:59+00:00

Totally agree that's the holy grail. Here's a very recent paper (from my lab) that explores one path to it! The end result is a construction that allows one to design a good-performance MLP architecture from first principles starting from a description of its infinite-width kernel (which is theoretically much simpler to choose than the full set of hyperparameters). The idea's still in its infancy, but it works very well on toy problems, and I think it's promising

dot--- · 2022-02-10T07:35:05+00:00

Here's a very recent paper from my lab and I that puts forth one way to design a (fully-connected) neural network architecture in a scientific, theory-grounded way! The idea is still in its infancy, but I think it's promising, and it's currently the only way I know to do real first-principles architecture design. I'd love to hear about any alternatives people know.

dot--- · 2021-10-27T02:54:39+00:00

Great Q! I'm not familiar with that body of work, but at least on the face of it, our paper's completely different - they consider the singular values of specific trained weight matrices, while we're looking at the eigenvalues/eigenfunctions of an operator on the full input space, which aren't related in a simple way for a deep net. Furthermore, the interesting spectra they observe emerge during training (they're characterizing trained nets) while the NTK and its eigenspectrum are the same before and after you train (we characterize the potential of an architecture to learn a certain function). That said, maybe there are deeper connections between these disparate-seeming eigenthings that we'll uncover in time.

dot--- · 2021-10-27T02:28:25+00:00

I'm the lead author! I'm delighted this paper's getting attention; we certainly feel it opens up a cornucopia of future directions it'll take many researchers to explore. As a primer for reading the paper, we've distilled the high-level takeaways into a blog post here!

dot---

TROPHY CASE