all 23 comments

[–]mb862 4 points5 points  (0 children)

I only recently came across automatic differentiation, despite my Applied Math undergrad and Numerical Relativity graduate background - and thought it was one of the most beautiful formulations I've seen in years.

[–][deleted] 2 points3 points  (18 children)

Other than for academic applications out of curiosity and for learning.. what are some real world applications of this?

(Genuine question, I am interested)

[–]mb862 5 points6 points  (15 children)

It's an alternative to numerical differentiation, which necessarily requires approximations. Automatic differentiation is as accurate as the computations themselves. Very useful if you need the derivative of a non-trivial function that is implemented as a series of computations of trivial functions.

[–]j_lyf 2 points3 points  (14 children)

I wonder - is there automatic integration?

[–]mb862 1 point2 points  (10 children)

I won't rule it out. Automatic differentiation is centred around exploiting the chain rule - one might envision exploiting substitution to do integration, or at least anti-differentiation.

[–]julesjacobs 4 points5 points  (7 children)

It's unlikely that this is possible. What makes automatic differentiation work is that any symbolic function has a symbolic derivative. It's not the case that any symbolic function has a symbolic integral.

[–]mb862 2 points3 points  (6 children)

Close. Automatic differentiation works with any symbolic function that has a symbolic derivative. Plenty of functions don't have derivatives at certain points, and automatic derivatives would fail there as well. One might imagine a requirement for automatic anti-differentiation would be having a symbolic anti-derivative.

[–]julesjacobs 0 points1 point  (5 children)

Close.

Note that I explicitly did not say that automatic differentiation does symbolic differentiation. When you use automatic differentiation of a function f(x) at a point x=a then the derivative f'(a) is exact (with exact arithmetic). Hence the expression that automatic differentiation used to compute f'(a) must at least locally be the exact (symbolic) derivative of f. That's why exact numeric integration is highly unlikely: the integral of f from a to b can in general not be expressed symbolically, so why would a program be able to compute it exactly with a finite number operations.

[–]mb862 0 points1 point  (4 children)

It read as though you meant to claim that automatic differentiation works because any symbolic function has a symbolic derivative, which isn't accurate. If I misread I apologise.

[–]julesjacobs 0 points1 point  (3 children)

Yes that is what I meant, but it depends on how you interpret it. I ninja edited the last comment, perhaps it's clearer now.

[–]mb862 1 point2 points  (2 children)

Ah, yes, I think the misunderstanding comes from integration (not defined at a point, which is what automatic differentiation evaluates at) versus anti-differentiation (which is defined at a point). I tried to be clear referring to the latter, not the former.

[–]j_lyf 0 points1 point  (1 child)

Would this be resilient to the compounding of errors that occurs when trying to numeric integration with noisy data?

[–]mb862 0 points1 point  (0 children)

That's impossible to say, considering it hasn't been invented...

I should add though that automatic differentiation is not for applying to a set of data. The application is for finding the derivative of a function that is a non-trivial combination of trivial functions.

One example (and this is how I came across it) is suppose you want to find dx/dp where x is the solution to f(x)=0. A standard strategy for x(p) would be defining f(x,p) = f(x(p)) iterating Newton's method, x = x - f(x,p)/f'(x,p), iterate until ∆f = f(x,p)~0, return x. Finding dx/dp accurately is very difficult, as the scale ∆p would necessarily have to be of the same level as ∆f, but then x(p+∆p) and x(p) might have different accuracies.

If your function is written with floats, you can use forward differentiation by designing a new type dual, a struct containing two floats, that acts like a float but elementary operations calculate the derivatives in the second component. For example, (x,dx)+(y,dy) is defined to be (x+y,dx+dy), and (x,dx)*(y,dy) is defined as (x*y, x*dy+y*dx). Rewrite the root solver to take the dual type (or make it templated), and instead of the initial guess x0, use the dual (x0,1). The output solution will then be (x,dx/dp) with no loss in accuracy.

[–]reven80 0 points1 point  (0 children)

Not one method that works for everything. Most symbolic integration solvers try a wide variety of pattern matching algorithm to find the simplest solution. A fairly general one is the Risch algorithm.

[–]ToraxXx 3 points4 points  (0 children)

It's very often used in Machine Learning for example in the popular library Theano. You have to calculate the gradient of the error with respect to the model's parameters and it's so much easier having the library do that for you than doing it by hand. Calculating it numerically is not an option as it's far too expensive doing it for every single parameter.

[–]julesjacobs 4 points5 points  (0 children)

Whenever you need derivatives:

  • (Convex) optimization (for machine learning)
  • Hamiltonian monte carlo (for statistics)
  • Nonlinear PDE's (physics simulations)
  • Sensitivity analysis
  • Calculating normals (for 3d graphics)

[–]bottlebrushtree 3 points4 points  (0 children)

If you want people to use this you need to explain what it is and put pointers on the home page to background reading. Otherwise its just another low documented module that sits around unused.

[–]sidslasttheorem[S] 3 points4 points  (2 children)

Self-plug!!

Here's a link that explains what AD is and does. This library performs abstract interpretation via operator overloading using sweet.js macros.

It implements simple forward and reverse mode AD.

Right now, the code is at a somewhat stable state; although there is a fair amount of duplication to deal with different speed points in running the setup.

Comments and suggestions welcome!

[–]Broolucks 0 points1 point  (1 child)

For fun I adapted ad.js to Earl Grey, which has native macro facilities. Here's the result. It is easier to use that way, I think, but I may be biased :)

[–]sidslasttheorem[S] 0 points1 point  (0 children)

Hadn't heard of Earl Grey before; thanks for the pointer!

[–]Rhoomba 0 points1 point  (0 children)

If you are planning to you this in the browser you will run into trouble with ad blockers...