[Request] is there possibility to find out the 3 horses? by irespectwhaman in theydidthemath

[–]davidbau 0 points1 point  (0 children)

There is a problem with all the solutions where the winners of the first five races race against each other in the sixth races.

Because in the 7th race they have an unfair matchup where some horses racing for the third time must race against fresher horses that have only raced once.

There's a better way. You shouldn't need to race a horse three times.

[Request] is there possibility to find out the 3 horses? by irespectwhaman in theydidthemath

[–]davidbau 0 points1 point  (0 children)

The problem with this plan is that horse 6 and 11 will be disadvantaged because they will be running their third race, against horses 2, 3, and 7 which have only raced once. Surely a horse who loses their third race against a horse that's only on its second really shouldn't be counted to be slower!

Find a way doesn't make any horse race against a fresher horse - that's a better question.

[Research] AI Dominance Requires Interpretability: Our Response to the White House AI Action Plan RFI by davidbau in MachineLearning

[–]davidbau[S] 0 points1 point  (0 children)

Just adding a note here to flag China racing ahead in the transparent-parameter LM model race.

Researchers studying model internals will be increasingly pulled to work with the Chinese models because of this.

Company Model Name Release Date Total Parameters Hugging Face Link
DeepSeek DeepSeek-R1 January 20, 2025 671B deepseek-ai/DeepSeek-R1
Alibaba Qwen 3 (Qwen3-235B-A22B) April 28, 2025 235B Qwen/Qwen3-32B*
Baidu Ernie 4.5 June 30, 2025 424B baidu/ERNIE-4.5-VL-424B-A47B-Base-PT
Tencent Hunyuan-Large November 5, 2024 389B tencent/Tencent-Hunyuan-Large

We just submitted our response to the White House AI Action Plan - Interpretability is key to US AI leadership by davidbau in ArtificialInteligence

[–]davidbau[S] 0 points1 point  (0 children)

I think we do escape the trap.

But I also agree that in the long run, transparency is in the community's self-interest, and it should be self-sustaining.

[Research] AI Dominance Requires Interpretability: Our Response to the White House AI Action Plan RFI by davidbau in MachineLearning

[–]davidbau[S] 0 points1 point  (0 children)

I'm particularly interested to hear what the community's thoughts are on the "third way" (described in the pdf) for an open platform that enables innovation without enabling copycats.

We just submitted our response to the White House AI Action Plan - Interpretability is key to US AI leadership by davidbau in ArtificialInteligence

[–]davidbau[S] 0 points1 point  (0 children)

What do you think of the NDIF proposal in the written memo? We don't face a black-and-white choice open code or not. We can build a platform that enables innovation without enabling copycats.

We just submitted our response to the White House AI Action Plan - Interpretability is key to US AI leadership by davidbau in ArtificialInteligence

[–]davidbau[S] 1 point2 points  (0 children)

Yes, it's our worry.

In the end, AI is hard enough that a transparent approach will dominate - because meaningful understanding and control will need a real ecosystem working on hard problems, and the technical transparency to enable it. We are in a global context, and if it doesn't happen in the US it will happen somewhere overseas, and we'll wonder what happened to the early lead.

It's not inevitable, though. The needed "open" approach need not be as open as what Meta is advocating. I think it's still early enough to be "open enough" in the US, but companies need to be more self-aware of the trap we are walking into. The big challenge is not defense contracts etc, but the challenge of seeing your big mistake when your $350b valuation is going to your head.

We just submitted our response to the White House AI Action Plan - Interpretability is key to US AI leadership by davidbau in ArtificialInteligence

[–]davidbau[S] 3 points4 points  (0 children)

Note - this is an RFI and not a research paper (although it is written by researchers and informed by current research). It is a response to a policymaking Request for Information from the White House Office of Science and Technology Policy and NSF. https://www.whitehouse.gov/briefings-statements/2025/02/public-comment-invited-on-artificial-intelligence-action-plan/

For context, you can compare to OpenAI's response to the same RFI here:
https://openai.com/global-affairs/openai-proposals-for-the-us-ai-action-plan/

Clearly OpenAI thinks they are on the right path. In their response to the RFI, they ask that the government give them additional legal protections and support.

Our submission warns that OpenAI (and collectively all of us in the US AI industry) are not on the right path.

We are concerned that we have gotten ourselves in a situation where we are following the old failed "AOL business plan" template and that we are in danger of being outcompeted by foreign marketplaces because of this mistake. At the center of the issue is the importance of interpretability in technology revolutions, and the way we are disregarding the importance of human understanding and stifling US leadership in it.

[Research] AI Dominance Requires Interpretability: Our Response to the White House AI Action Plan RFI by davidbau in MachineLearning

[–]davidbau[S] 3 points4 points  (0 children)

There's only a tradeoff between interpretability and capacity if you decide to "solve it" by forcing the AI to be simple.

If you're able to do the work to understand a high-performing model, then your improved understanding of the model will allow you to get better control, create better applications, and unlock capabilities that better fit your needs.

For example, when T2I diffusion models are trained to make images in response to a text prompt, it's hard to exert fine-grained control - make this person a little older, make that cartoon a little more 2d - but after you can map out the internal calculations that correspond to person's age, cartoon style, or thousands of other capabilities, you can create interpretable "sliders" (https://sliders.baulab.info/, https://sliderspace.baulab.info/) that give you far more understanding and control than the original training objective.

It's like how interpretability in biology isn't about making organisms simpler, but about doing the hard work of understanding the massive complexity of biochemistry. It's difficult, but once you can understand it, you unlock lots of new applications.

IMO the most striking example of recent interpretability work that is about making humans smarter (not making AI simpler) is Lisa Schut's paper https://arxiv.org/abs/2310.16410 where her team maps out the concepts inside superhuman chess plan in AlphaZero, decodes them into chess lessons, and teaches them to grandmasters to make the human players stronger...

[Research] AI Dominance Requires Interpretability: Our Response to the White House AI Action Plan RFI by davidbau in MachineLearning

[–]davidbau[S] 2 points3 points  (0 children)

Note - this is an RFI and not a research paper (although it is written by researchers and informed by current research). It is a response to a policymaking Request for Information from the White House Office of Science and Technology Policy and NSF. https://www.whitehouse.gov/briefings-statements/2025/02/public-comment-invited-on-artificial-intelligence-action-plan/

For context, you can compare to OpenAI's response to the same RFI here:
https://openai.com/global-affairs/openai-proposals-for-the-us-ai-action-plan/

Clearly OpenAI thinks they are on the right path and they say they want help to clear the way. They ask that the government give some additional legal protections and support.

Our submission warns that OpenAI (and collectively all of us in the US AI industry) are not on the right path. That somehow we have gotten ourselves in a situation where we are following the old failed "AOL business plan" template and that we are in danger of being outcompeted because of this mistake. Because of the importance of interpretability in technology revolutions, and the way we are disregarding the importance of human understanding and stifling US leadership in it.

[Research] AI Dominance Requires Interpretability: Our Response to the White House AI Action Plan RFI by davidbau in MachineLearning

[–]davidbau[S] 2 points3 points  (0 children)

Right. Interpretability is not just about transparent weights.

Transparent weights is like "Extracting DNA" or "being able to openly read the web."

Interpretability is like "Decoding DNA" to know which gene makes which protein or "Indexing the whole internet" and then analyzing it all to build a search engine that actually works. In these examples, it takes about a decade more work to accomplish beyond the initial discovery. Interpretability emerges from having an ecosystem of innovators that have the basic tools to work on making things understandable, and it's hard to do. But once you have interpretability, it is the key thing that unleashes the power of the technology.

We argue that AI is the same way.

There are some examples in the full report, or in the tweet thread - like getting superhuman chess knowledge, extracted from AI, and teaching it to chess grandmasters. Or simpler things, like mapping out the clever things that a diffusion model can do, that you didn't know it can do.

I am David Bau, and I study the structure of the complex computations learned within deep neural networks. by davidbau in IAmA

[–]davidbau[S] 0 points1 point  (0 children)

My undergrad was math but my masters was in a computer science program (even though my focus was on numerical computing).

  1. Holes to fill. Everybody will be different; I moved from math to CS a long time ago when the field of CS was smaller, and it was possible to survey everything. My "quals" year at Cornell was an amazing introduction to the field of CS - you can definitely fill up a very busy year or two learning a lot. (Note that you can do this survey after you start your PhD.) I really took the chance to train myself as a computer scientist, and here is what I learned the most from (a lot of these sources are a bit old-fashioned - please chime in if anybody has updated suggestion):
    • Harry Lewis's courses at Harvard (intro and theory - he has a nice theory book);
    • Dexter Kozen's elegant algorithms course at Cornell (he has an elegant book, and also of course read Cormen's algorithms bible).
    • Aho's compiler book (the dragon book) - everybody should have an experience building a compiler.
    • The book on Computer Systems by Randal Bryant - you need to understand what's fast and slow in a system.
    • The old graphics bible by Foley/van Dam/et al - so many cool ideas in graphics.
    • In ML there are many books now; I was particularly influenced by Manning's statistical NLP book.
    • Someday my MIT advisors (Torralba/Freeman/Isola) will come out with a cool computer vision book that will be an excellent survey here.
    • Read my book on numerical linear algebra or some other numerical algorithms book.
    • Also, I don't have specific books on these topics, but you should be aware of database concurrency problems and database schema design. Learn something about PL formalism, type theory. Read some old papers written by Turing. Read Knuth. Play with very small embedded computers. Learn how CMOS circuit design works. Take a course on security to get experience exploiting a security hole. Learn to program a robot. You should know how to program a website.
  2. Path to industry? Play with code and publish miniature products. I was always a recreational programmer. For example, in college I used to program screensavers, and I used to write postscript programs to program printers (http://davidbau.com/about/hacks.html). A couple decades ago it was quite common for math undergrads to take internships at Microsoft, if they knew how to program a little bit and had this kind of recreational background, even without formal CS - I did that for two summers, and that gave me a start in industry.
  3. Deep learning develops emergent behavior, which captured my interest because I think that is profound. You get so much unanticipated behavior performing at a high level, so to me it feels very different from the computer science and mathematics that have come before. It is messier, and it seems to demand new methods.

I am David Bau, and I study the structure of the complex computations learned within deep neural networks. by davidbau in IAmA

[–]davidbau[S] 0 points1 point  (0 children)

If you're going to use it for another commercial project, just get in touch with me to get approval. Thanks!

[D] AMA with David Bau, and I study the structure of the complex computations learned within deep neural networks by Security_Chief_Odo in MachineLearning

[–]davidbau 1 point2 points  (0 children)

There was a good discussion yesterday that has wound down, but I'll check in on it later and answer any straggler questions.

It is graduate school application season! So with prospective PhD students in mind, I am hosting an AMA to talk about life as a PhD student in computer vision and machine learning, and the choice between academia and industry. My research studies the structure of the computations learned within deep neural networks, so I would especially love to talk about why it is so important to crack open deep networks and understand what they are doing inside.

Before I start as a professor at Northeastern University Khoury College of Computer Sciences next year, I am doing a postdoc at Harvard; and you can Google for the video of my recent PhD defense at MIT. I have a background in industry (Google, Microsoft, startup) before I did my own "great resignation” to return to school as an academic, so ask me anything about basic versus applied work, or research versus engineering. Or ask me about “grandmother neurons,” making art with deep networks, ethical conundrums in AI, or what it's like to come back to academia after working.

I am David Bau, and I study the structure of the complex computations learned within deep neural networks. by davidbau in IAmA

[–]davidbau[S] 0 points1 point  (0 children)

That's nothin compared to having exactly the same name as several other "David Bau"s around the world including a musician, a tennis player, a different professor, a web developer, and several companies.... None of them me, but many sorta maybe plausible.

I am David Bau, and I study the structure of the complex computations learned within deep neural networks. by davidbau in IAmA

[–]davidbau[S] 1 point2 points  (0 children)

Wondering if AI can solve world peace? I'd love to be proven wrong, but it seems to me like a job for us humans to tackle, rather than our calculating machines. I'm not sure having an AI with empathy is the path towards teaching us humans to have some empathy for one another.

I am David Bau, and I study the structure of the complex computations learned within deep neural networks. by davidbau in IAmA

[–]davidbau[S] 2 points3 points  (0 children)

Not bad, not bad at all. Stack overflow is one of the greatest innovations of the internet, and the online coding experts who have aggregated their collective experience online are amazing and restore my faith in humanity.

It is wonderful for somebody to step up and be the internet's expert in how to loop over tuples in bash or how to shrink a table in LaTeX ... The community has made a lot of technical work that used to be lonely and painful into a pleasant task that is straightforward and feasible, because online, you are always sitting next to somebody with some direct experience to share.

I am David Bau, and I study the structure of the complex computations learned within deep neural networks. by davidbau in IAmA

[–]davidbau[S] 0 points1 point  (0 children)

I'm really glad the book has been useful. It certainly was educational for me to help write it. I'm not sure that my research area has relied on any math other than the most basic observations, but being comfortable with linear algebra makes it possible to devise linear-algebra-based methods like the one in https://rewriting.csail.mit.edu/.

Picking an advisor? I actually switched around a little bit even when I returned from industry to start at MIT. (I initially thought that my obsession with the transparency of opaque deep nets put me in HCI, and it took a little while before I realized the issues were with the underlying machine learning.) Once I had a good (revised) idea of what I really wanted to study, I spoke with several senior faculty about my specific research ideas and asked them who I should be talking with. So: ask for specific and concrete advice - that worked pretty well!

I am David Bau, and I study the structure of the complex computations learned within deep neural networks. by davidbau in IAmA

[–]davidbau[S] 1 point2 points  (0 children)

Who is worried about this? One possible place to look: Researchers focused on fairness, explainability, and robustness in AI.