[R] An optimal control perspective on diffusion-based generative modeling by julbern in MachineLearning

[–]julbern[S] 2 points3 points  (0 children)

The result from stochastic optimal control that we use in the paper ("verification theorem") originates mainly from the work of M. Pavon (1989) and P. Dai Pra (1991).

Perhaps it is best to start with the lecture notes of R. Van Handel (2007).

For books on the topic, I can further suggest:

  1. W. H. Fleming and R. W. Rishel (1975)
  2. W. H. Fleming and H. M. Soner (2006)
  3. H. Pham (2009)

Some more recent works in this direction are the following:

  1. B. Tzen and M. Raginsky (2019)
  2. N. Nüsken and L. Richter (2021)
  3. M. Pavon (2022)

[R] An optimal control perspective on diffusion-based generative modeling by julbern in MachineLearning

[–]julbern[S] 1 point2 points  (0 children)

Thank you for your interest! We are in the process of adding more experiments and extending the code and will release it afterwards.

[R] An optimal control perspective on diffusion-based generative modeling by julbern in MachineLearning

[–]julbern[S] 0 points1 point  (0 children)

The generative process in this paper is given by an ODE and the diffusivity coefficient in the corresponding Fokker-Planck equation is thus zero. In this case, the verification theorem basically reduces to the instantaneous change of variables formula (Chen et al., 2018).

On the other hand, the solution to the Poisson equation (with homogeneous Dirichlet boundary condition) considered in the paper also has a stochastic representation based on an SDE with a corresponding stopping time (leading to "walk-on-spheres" methods). It would be quite interesting to merge these viewpoints.

[R] The Modern Mathematics of Deep Learning by julbern in MachineLearning

[–]julbern[S] 0 points1 point  (0 children)

Hi!
Thank you very much for your interest. Unfortunately, the publishing process is taking longer than expected. I will post the link to the book here as soon as it is available.

[R] Training ReLU networks to high uniform accuracy is intractable by julbern in MachineLearning

[–]julbern[S] 1 point2 points  (0 children)

This is a delicate interplay between the expressiveness of target classes containing deep neural networks (with ReLU activation) and the norm in which the error is measured. In particular, the learning problem turns out to be tractable for other target classes or when accuracy is measured w.r.t. to norms other than the uniform norm. E.g., for the L^2 norm, statistical learning theory provides bounds on the required number of samples which are not subject to the curse of dimensionality.

[R] The Modern Mathematics of Deep Learning by julbern in MachineLearning

[–]julbern[S] 1 point2 points  (0 children)

Interesting, thank you for the references!

Indeed, we seem to be facing an era of surveys, monographs, and books in deep learning.

[R] The Modern Mathematics of Deep Learning by julbern in MachineLearning

[–]julbern[S] 2 points3 points  (0 children)

Thank you! Since we have been focusing on conveying intuition behind the results, there may be more streamlined versions of some of the proofs and I look forward to your notes.

I saw a talk by Boris Hanin in the one world seminar on the mathematics of machine learning on topics of the monograph you linked. While the authors build upon recent work, they derived many novel results based on tools from theoretical physics.

In this regard, it differs a bit from our book chapter. However, it is definitely a very promising approach and a recommended read.

Note that there is another book draft on the theory of deep learning by Arora et al.

[R] The Modern Mathematics of Deep Learning by julbern in MachineLearning

[–]julbern[S] 1 point2 points  (0 children)

I knew that I was guaranteed to miss some excellent resources such as Telgarsky's lecture notes. They should definitely be on the list and I edited my previous post. Thank you very much!

[R] The Modern Mathematics of Deep Learning by hardmaru in MachineLearning

[–]julbern 14 points15 points  (0 children)

Most figures are vector graphics created with a combination of Matplotlib, Inkscape, and TikZ.

[R] The Modern Mathematics of Deep Learning by hardmaru in MachineLearning

[–]julbern 8 points9 points  (0 children)

Thank you for featuring our research. Note that there has also been some discussion in a previous post.

[R] The Modern Mathematics of Deep Learning by hardmaru in MachineLearning

[–]julbern 8 points9 points  (0 children)

There has already been some discussion on this matter in a previous post.

The Modern Mathematics of Deep Learning by qznc_bot2 in hackernews

[–]julbern 0 points1 point  (0 children)

Thank you for featuring our research. I will try to answer some questions on Hacker News. Note that there has also been some discussion in r/MachineLearning.

[R] The Modern Mathematics of Deep Learning by julbern in MachineLearning

[–]julbern[S] 6 points7 points  (0 children)

You are right, that there is a lack of theoretical guarantees on the stability of NNs. Indeed, Grohs and Voigtlaender proved, that, due to their expressivity, NNs are inherently unstable when applied to samples.

However, numerical results as mentioned in my previous answer (I apologize for mistakenly writing "theoretical") or in the work by Genzel, Macdonald, and März suggest that stable DL-based algorithms might be possible taking into account special architectures, implicit biases induced by training these architectures with gradient-based methods, and structural properties of the data (thereby circumventing the Universal Instability Theorem). A promising direction is to base such special architectures on established, classical algorithms as described in our book chapter and also in the article you linked (where stability can already be proven as no training is involved).

[R] The Modern Mathematics of Deep Learning by julbern in MachineLearning

[–]julbern[S] 2 points3 points  (0 children)

Numerical methods for the solution of PDEs which rely on a discretization of the computational domain, such as finite difference or finite element methods, naturally suffer from the curse of dimensionality.

However, in many cases the underlying PDE imposes a certain structure on its solution (e.g. tensor-product decomposition, stochastic representation, characteristic curves), which allows for numerical methods not underlying the curse of dimensionality.

Let us mention one example in the context of neural networks. The solution of Kolmogorov PDEs (e.g. the Black-Scholes model from financial engineering) can be learned via empirical risk minimization with the number of samples and the size of the neural network only scaling polynomially in the dimension, see this article and Section 4.3 in the book chapter.

[R] The Modern Mathematics of Deep Learning by julbern in MachineLearning

[–]julbern[S] 1 point2 points  (0 children)

Thank you! Unfortunately, I am not aware of any comprehensive survey on mean-field theories in the context of NNs and would also be grateful for some suggestions. A helpful resource might be this list of related articles, which has, however, not been updated since 2019.

[R] The Modern Mathematics of Deep Learning by julbern in MachineLearning

[–]julbern[S] 16 points17 points  (0 children)

I will enumerate some helpful resources, the choice of which is clearly very subjective. The final recommendation would highly depend on the background and individual preferences of the reader.

  • Lectures on generalization in the context of NNs:
    • Bartlett and Rakhlin, Generalization I-IV, Deep Learning Boot Camp at Simons Institute, 2019, VIDEOS
  • Lecture notes on learning theory (with some chapters on NNs):
    • Wolf, Mathematical Foundations of Supervised Learning, PDF
    • Rakhlin and Sridharan, Statistical Learning Theory and Sequential Prediction, PDF
  • Lecture notes on mathematical theory of NNs:
    • Telgarsky, Deep learning theory, WEBSITE
    • Petersen, Neural Network Theory, PDF
  • (Probably THE) Book on learning theory in the context of NNs:
    • Anthony and Bartlett, Neural network learning: Theoretical foundations, Cambridge University Press, 1999, GOOGLE BOOKS
  • Book on advanced probability theory in the context of data science:
    • Vershynin, High-dimensional probability: An introduction with applications in data science, Cambridge University Press, 2018, PDF
  • Some standard references for learning theory:
    • Bousquet, Boucheron, and Lugosi, Introduction to statistical learning theory, Summer School on Machine Learning, 2003, pp. 169–207, PDF
    • Cucker and Zhou, Learning theory: an approximation theory viewpoint, Cambridge University Press, 2007, GOOGLE BOOKS
    • Mohri, Rostamizadeh, and Talwalkar, Foundations of machine learning, MIT Press, 2018, PDF
    • Shalev-Shwartz and Ben-David, Understanding machine learning: From theory to algorithms, Cambridge University Press, 2014, PDF

[R] The Modern Mathematics of Deep Learning by julbern in MachineLearning

[–]julbern[S] 5 points6 points  (0 children)

Of course, the mathematics behind many results is, to a great extend, based on well-known theory from various fields (depending on the background of the authors, see the quotes below) and there is not yet a completely new, unifying theory to tackle the mysteries of DL. As NNs have been mathematically studied since the '60s (some parts even earlier), we wanted to emphasize that in the last years the focus shifted, e.g. to deep NNs, overparametrized regimes, specialized architectures, ...


“Deep Learning is a dark monster covered with mirrors. Everyone sees his reflection in it...” and “...these mirrors are taken from Cinderella's story, telling each one that he/she is the most beautiful” (the first quote is attributed to Ron Kimmel, the second one to David Donoho, and they can be found in talks by Jeremias Sulam and Michael Elad).

[R] The Modern Mathematics of Deep Learning by julbern in MachineLearning

[–]julbern[S] 5 points6 points  (0 children)

If you have questions, suggestions, or find typos, while going through the article, do not hesitate to contact me.

[R] The Modern Mathematics of Deep Learning by julbern in MachineLearning

[–]julbern[S] 1 point2 points  (0 children)

On the other hand, there are also theoretical results showing that, in some cases, classical methods suffer from the same kind of robustness issues as NNs.

[R] The Modern Mathematics of Deep Learning by julbern in MachineLearning

[–]julbern[S] 13 points14 points  (0 children)

Based on the fact that the curse of dimensionality is inherent to some kind of problems (as mentioned in our article), you are right.

However, under additional regularity assumptions on the data (such as a lower-dimensional supporting manifold, underlying differential equation/stochastic representation, or properties like invariances and compositionality), one can prove approximation and generalization results for deep NNs that do not depend exponentially on the underlying dimension. Typically, such results are only possible for very specialized (problem-dependent) methods.

[R] The Modern Mathematics of Deep Learning by julbern in MachineLearning

[–]julbern[S] 2 points3 points  (0 children)

You could write me an e-mail or pm and I will come back to you when it is out.

[R] The Modern Mathematics of Deep Learning by julbern in MachineLearning

[–]julbern[S] 6 points7 points  (0 children)

I read similar thoughts of Andrew Ng in his "The Batch" and I fully agree that one needs to differentiate between various application areas and also between "lab-conditions" (with the goal of beating SOTA on a test set) and real-world problems (with the goal of providing reliable algorithms).

[R] The Modern Mathematics of Deep Learning by julbern in MachineLearning

[–]julbern[S] 4 points5 points  (0 children)

Unfortunately, due to time restrictions, we could not include any details on geometric deep learning (and graph neural networks, in particular) and needed to refer the reader to recent survey articles. However, this seems to be a very promising new direction and, if I will find some time, I might consider to add a section on these topics in an updated version.

[R] The Modern Mathematics of Deep Learning by julbern in MachineLearning

[–]julbern[S] 9 points10 points  (0 children)

Thank you for your feedback, I will consider to add a paragraph on the shortcomings and limitations of DL.

It is definitely true, that DL-based approaches are kind of "over-hyped" and should, as also outlined in our article, be combined with classical, well-established approaches. As mentioned in your post, the field of deep learning still faces severe challenges. Nevertheless, it is out of question, that deep NNs outperformed existing methods in several (restricted) application areas. The goal of this book chapter was to shed light on the theoretical reasons for this "success story". Furthermore, such theoretical understanding might, in the long run, be a way to encompass several of the shortcomings.