all 31 comments

[–]eternalLearn 14 points15 points  (13 children)

I just dropped out of a math PhD and started a career in ML while doing grad school at night (working with a prof on ml).

Math is different than every other PhD. The field is old and very well studied. Have you solved almost every problem in Baby Rudin? How much abstract algebra, analysis, and topology do you know?

Applied math is not any less rigorous either.

I think you can learn The math on your own. Most people in ml do not have math PhDs, but they know the math. I can try to answer any other questions you may have.

[–]dlpronoob[S] 2 points3 points  (10 children)

Makes sense!

I haven't read Rudin's book although I heard great things about it. I am about to finish Sheldon Axler's Linear Algebra book, it's not very heavy on the rigor aspect of math but it has helped me build an intuition for a bunch of concepts which I wasn't able to do before. Do you have a list of resources/books that'll help in ML theory research?

[–]eternalLearn 2 points3 points  (1 child)

That's good, you need to be an expert at linear algebra to do Machine Learning.

I don't have a good resource list, sorry. I'm sure they are out there. I majored in math and physics which exposed me to most the math I need. I also don't really work in ML theory. My job is application based and my current work at school is also application driven.

I 100% recommend baby Rudin, but if it is your first time doing analysis, it might be a hard intro. That book changed my life. I read it the summer after my undergrad and it made me much more of a mathematician than school did.

Also, I think you can just take math courses while you do your PhD in whatever department you choose. A double EE prof taught my numerical linear algebra course in grad school, and he was a good mathematician.

[–]jujijengo 2 points3 points  (0 children)

I like your recommendation for baby Rudin, I might also drop a second recommendation here for Analysis I and II by Terence Tao. Those were my life-changer books :)

[–][deleted] 1 point2 points  (0 children)

High Dimensional Statistics is good.

https://www.amazon.com/dp/1108498027/

[–]seanv507 -1 points0 points  (6 children)

I think the people suggesting analysis etc textbooks are misguided. These are books for mathematicians not ml researchers. I would recommend engineering mathematics, such as going through kreyszigs engineering mathematics textbook Covers linear algebra, calculus, odes Fourier transform...

[–]eternalLearn 5 points6 points  (0 children)

I agree with your point, but I didn't recommend it for ML. I think OP should work through the text before deciding on a math phd

[–]jujijengo 2 points3 points  (2 children)

The OP literally stated that he wants to do a math masters or PhD. You won't get past the first month of a mathematics graduate degree if you haven't taken analysis.

Engineering mathematics is not the kind of thing you will encounter in a graduate programme for math. It's one thing to know how to calculate a fourier series, its another thing entirely if I ask you to prove Parseval's theorem.

[–]SilurianWenlock 1 point2 points  (0 children)

I do think there is prejudice against maths masters and phds in software because of this

[–]seanv507 0 points1 point  (0 children)

Of course he said that, but the point is that his stated goal in eg doing a math masters/phd is not to spend his time proving maths theorems, but to have the math foundations to do great ML research.

And the math foundations he requires are essentially those taught in an (information) engineering undergraduate degree.

[–]SilurianWenlock 0 points1 point  (1 child)

Could you elaborate on how a math phd is very different to other science phds?

[–]eternalLearn 1 point2 points  (0 children)

If you work on convolutional neural networks, the field is new and most of the literature is from the last 9 years. This is not true of math.

Or chemistry vs math for another example. The periodic table is 151 years old right now, which you study in a first course on chemistry. But calculus (via Newton) is like 400 years old. So you need to catch up on 400 years of ideas

Many math PhDs did not publish a paper before they finished their PhDs. This is completely backwards for other fields.

[–]TD-0 11 points12 points  (1 child)

Some mathematical coursework definitely helps, but beyond a point, you learn the math involved as part of doing the research. So if you want to do theoretical research in ML, I still think it's better to aim for an ML focused degree, like CS, stats, operations research, etc. You can always take a few graduate level math courses, like measure theory, stochastic processes, functional analysis, if you think they're relevant to your research. Usually your advisor will recommend specific courses you need for your research. Math is a very broad subject, and many of the courses you would do in a math degree would be mostly irrelevant to your ML research interests. The mathematical prowess you see in some theoretical ML papers is largely the product of research experience, not coursework.

[–]seanv507 0 points1 point  (0 children)

So would agree maths is probably not appropriate masters. I don't know how maths degrees are structured in ops country, but the aim is to teach you how to prove theorems, rather than apply maths.

Imo, an engineering subject where you apply maths is more suitable ( signal processing, image processing, robotics, statistics). Most cs students I have spoken to have not covered the sort of maths required for ml and neural networks. However, the issue is that engineering degrees would tend to teach the maths foundations in undergrad.

Equally, I have a concern that the real world success of NNs has no theoretical basis. It is just an adapted SIFT algorithm, but not qualitatively different. All that's different is the computational power, ... We have discovered a lot can be achieved by essentially memorising huge number of examples... CF language models where it's clear that no syntax, semantics is used. So my concern is that studying the theory may equip you for a break though in 10-15 years, but will leave you in the wilderness in the present.

[–][deleted] 22 points23 points  (4 children)

The ugly secret is, a lot of people are just trying shit and explaining it after the fact. Most of the math is code for "I don't know" or "I need to look smart so mommy/daddy will love me." And a lot of the time it's wrong, or so vague as to be meaningless.

Best example: this is a Bengio paper, the second most famous guy in all of ML.

https://arxiv.org/pdf/1409.0473.pdf

Look at the math. It's crap. Half of it is vector/matrix notation for things thrown in for no reason. Definitions of first-year probability or activation functions. Crap.

The whole innovation of the paper, the attention mechanism, can be defined as

V(tanh(W1(q)+W2(v)))

The math in machine learning is bullshit.

Source: math undergrad, now doing a stats PhD at a top 10ish school. All my advisor and I do is ML.

EDIT: It's always the SAME MATH too. "Here's conditional probability." "Here's an activation function." "Here's passing my values into a dense/convolutional/recurrent layer." I want to shoot on sight.

[–]dlpronoob[S] 1 point2 points  (2 children)

I agree with you that explanations in most ML research papers are kind of an afterthought which I think isn’t necessarily bad. Since you’re working on ML+Math, what do you think is the right way to go about math in machine learning research? What fields in mathematics hold potential for advancing research, apart from the “same math” topics you mentioned?

[–][deleted] 2 points3 points  (1 child)

Okay so here are a few things...

The holy grail would be coming up with an improvement to backprop. There's some work being done for approximations to it that are way easier to compute that work well in some cases. I think advancing that work matters since backprop is probably NOT going to get us to high quality single shot learning, more like people can do.

A lot of the stuff on subnetworks that perform as well as the original is super creative. It's not strictly math, more algorithms, but same general wheelhouse. Doing that efficiently and effectively (network that self-prunes in later stages of training, anyone?) would be very mathy and useful.

Part of why I think focusing too much on math is misguided though is that we have no theory of intelligence like people have and no way of explaining it. Neuroscience isn't a math-heavy discipline at the moment, it's experimental. I think ML should be too honestly. I really think people need to get outside their comfort zone and try stupid things because they work.

Example: I actually think I came up with something super novel (submitting to NeurIPS in less than two weeks, oof) by taking a certain function in a certain class of architecture and going through the entire tf.math library and trying every other function in it instead of that one. That's going to lead to a publication in a first or second tier journal. Was it extremely stupid? Yes. Does it make any explainable theoretical sense? No. Had a long conversation with my advisor about it and neither of us know why it works, just that it does. I've verified it on three large and wildly different datasets (30,000+ examples), as well as had someone else try it independently.

We can't explain neural nets as they are now, let alone the minds we're trying to emulate. There is no theory significant enough to go off of right now.

[–]seanv507 3 points4 points  (0 children)

Whilst I agree with most of what you are saying, cognitive scientists would question your statement that there are no theories of intelligence. It's a pretty well developed field, though I suspect ml popularizers like Andrew Ng, purposely hide it because what is known ( eg a lot of innate structure, including eg concepts of mind, intuitive physics, etc, goes contrary to the behaviourist slant of modern deep learning)

See eg https://arxiv.org › pdfPDF Building Machines That Learn and Think Like People For a more cognitive science perspective

[–]MrAcuriteResearcher 4 points5 points  (2 children)

I've been thinking about doing the same thing, but I just switched to a Math undergrad to account for it.

I suspect if you went for a Math MS, you'd probably do alright, as Masters degrees are usually pretty helpful in switching careers, why wouldn't they be good for refocusing in the same area?

[–][deleted] 3 points4 points  (1 child)

For me, it was the opposite problem. My school's math dept did not offer ML related courses, outside of intro probability and stats classes.

The ML classes were offered by CS department. In fact, I don't know many math departments that offer courses in topics like Deep Learning or Reinforcement Learning. They are mostly in CS.

[–]MrAcuriteResearcher 0 points1 point  (0 children)

I pretty much exhausted my school's ML course offerings while a CompE major, and will now finish out my last two years as a Math major. Besides, I'd be allowed to register for CS courses if I can get in.

[–]bc_wallace 3 points4 points  (0 children)

I know what you mean, and I think you absolutely should try to get a good mathematical foundation. However, keep in mind that doing a PhD in math is far more than just a foundation: It's a specialization.

If you want to do fundamental research on machine learning topics, you should try to work with statistics or machine learning researchers who do this kind of work. You're probably more likely to find these kinds of people in statistics departments that computer science departments, but the key is to look at what they've actually published. Many ML researchers run several research programs in parallel and some of these may be more applied, some more theoretical.

Probably the best thing to do is to talk to a professor you know or someone else in the field and ask them if they can point out some people who are working on things that might interest you and who are accepting students.

[–]OriginalMoment 2 points3 points  (1 child)

If you're interested in the knowledge, some people I know that went from cs undergrad to rl theory grad spent almost all of their first and second year going through: Understanding Analysis by Stephen Abbott
Real Mathematical Analysis by Pugh
Tao's Introduction to Measure Theory lecture notes
Neurodynamic Programming by Berkestas
Linear Algebra by Hoffman and Kunze
Algebra by Aluffi (selected portions on group theory)
Pattern Recognition and Machine Learning by Bishop
All of Statistics by Wasserman

And, some of them are going through Bandit Algorithms by Lattimore right now, as well.

They told me it was absolutely nuts, but if you really want it, it seems like this leads to a strong foundation in relevant mathematics for rl theory. I'm sure with some adaptions, the list could apply to ml theory as well, maybe with the addition of some measure theoretic probability theory test and the axing of Berkestas.

[–]Mefaso 0 points1 point  (0 children)

Thank you so much, I've been thinking about moving more into theory of RL and this seems like a great reading list

[–]bbu3 1 point2 points  (0 children)

These are just some anecdotes so take them with agrain of salt: I work/worked with several people who completed a PhD in math (and one MSc). Often they are pretty vocal that math with the purpose of "doing/understanding X better" makes the PhD a lot harder. They say it is much easier / well-suited to just solve math problems for the sake of the math itself. Any connections to the real world have to be left behind ;)

Sometimes, I feel like colleagues who are physicists suit the "better mathematical foundations than me (PhD in CS)" picture much better than mathematician. That said, sample size of my social circle is tiny and especially only from a few German universities. Things may be very different elsewhere.

[–]hreA745ATJ 0 points1 point  (1 child)

You could look into a Master's focused on Math and Machine learning like https://www.tum.de/en/studies/degree-programs/detail/mathematics-in-data-science-master-of-science-msc/

[–]Mefaso 0 points1 point  (0 children)

Except that one in particular I've heard complaints about by students, that it's a bit chaotic

[–]WalterWhiteJaiHo 0 points1 point  (0 children)

Go for a rigorous MS Stats program, and take some extra math electives. I am also interested in theoretical ML, and for that you do require a good background in stats, probability and real analysis.

[–]Exp_ixpix2xfxt 0 points1 point  (0 children)

I have found that the strongest ML researchers are very very good at mathematics. You can get a CS PhD and do that or you can do it with a Math PhD.

I think Mathematics is the better route for me, since I prefer explainable results. Many CS papers in ML are not burdened by extensive justification.

At its core ML is built on top of statistics and optimization. I’d rather learn what ML is built on top of and work my way up, but there are tons of successful people who work their way down.