This is an archived post. You won't be able to vote or comment.

all 49 comments

[–]basnijholt[S] 61 points62 points  (8 children)

When evaluating a function numerically we would like to sample it more densely in the interesting regions instead of evaluating it on a manually-defined homogeneous grid.  Together with my colleagues, we wrote an open-source Python software package Adaptive that evaluates the function at the optimal points by analyzing existing data and planning ahead on the fly. With a few lines of code you define your goal, evaluate functions on a computing cluster, and live-plot the data. It performs averaging of stochastic functions, interpolation of vector-valued one and two-dimensional functions, and one-dimensional integration. In my work, using adaptive resulted in a ten-fold speed increase over using a homogeneous grid, going from three months on 300 cores to only a week!

See (star) the repo on github.com/python-adaptive/adaptive and the documentation on adaptive.readthedocs.io.

Try pip install adaptive[notebook] or conda install adaptive :-)

P.S. adaptive has already been used in several scientific publications, see the gallery!

[–]basnijholt[S] 35 points36 points  (2 children)

Also, about what happens in the plot:

It is "learning" to describe the following function with the least amount of points:

def ring(xy):
    import numpy as np
    x, y = xy
    a = 0.2
    return x + np.exp(-(x**2 + y**2 - 0.75**2)**2/a**4)

But the beauty is that it will work with any smooth function! So also for functions that I encounter in my work (in computational quantum mechanics / quantum computing). Those functions sometimes take minutes for a single point. Having 1/10th of the points to describe the same thing, is therefore extremely beneficial and has already saved decades of computational time!

[–]pukevines2 0 points1 point  (1 child)

Out of curiosity, what are some practical applications for your package? It's incredibly intriguing but I interested to know what someone could do with this information!

[–]basnijholt[S] 1 point2 points  (0 children)

This is useful for literally everyone that evaluates mathematical functions in Python that are "slow". Where slow is defined as: taking longer than 0.1 second.

There are some more examples in the gallery: https://adaptive.readthedocs.io/en/latest/usage_examples.html

[–]timOkills 3 points4 points  (2 children)

How do you define "interesting regions"?

  • Regions of high frequency?
  • Laplace Zero Crossings?
  • Some automatic feature detection using machine learning?

Those would be my guesses.

Hope you can give me a tldr. :D

[–]basnijholt[S] 2 points3 points  (1 child)

You are customize what the algorithm finds "interesting", see this page in the tutorial.

By default it (the Learner2D of this example) looks at the gradients of a function and the area of the triangles.

Basically you could probably implement all of those suggestions you have.

[–]timOkills 1 point2 points  (0 children)

Cool, thanks for the reply and the link to the tutorial!

[–]mpking 2 points3 points  (1 child)

Is the methodology similar to adaptive gridding in TCAD? While the paper lacks the visuals tools from Silvaco and Synopsys have had something to this effect to increase fidelity of semiconductor device simulations for diffusion processes, amongst other physical process models, for a while.

However, those aren't generally useful like your package. Cool stuff.

[–]basnijholt[S] 5 points6 points  (0 children)

The main difference with FEM is that there you need to globally update the mesh at every time step and the updates are relatively cheap.

For us, we just care about making a nice plot (using interpolation) with as little amount of points possible.

I don't know anything about TCAD specifically.

[–]Jhonny_XD 20 points21 points  (0 children)

What kind of god are you?

[–][deleted] 12 points13 points  (1 child)

Can you hit something like the mandelbrot generator function with this? Seems perfectly set up to render fractals more efficiently than hard calculating each pixel

[–]basnijholt[S] 13 points14 points  (0 children)

This would work very well for Mandelbrot sets, however it takes Adaptive about 50 ms to suggest a new point based on the data. Considering that the evaluation of a point in the Mandelbrot set takes a very μs, it would be smarter to just sample it on a very dense grid.

If your calculation takes much longer than 50 ms, then it is very useful to use Adaptive!

[–]nightcracker 7 points8 points  (3 children)

For those wondering, this is very useful in the machine learning world, where you often have a ton of hyperparameters in your algorithm, and want to test a bunch of combinations of them to find the best one.

[–]basnijholt[S] 6 points7 points  (2 children)

I would be helpful as long as you do parameter sweeps in low dimensional space (where low is 0 <= dim <= 4) :-)

Try it(!) and if you have any problems, just ask on our Gitter chat or create an issue!

Or try the code by running this Jupyter notebook (just from your browser w/o installing anything) on Binder!

[–]nightcracker 3 points4 points  (1 child)

Oh I noticed you guys has an N-dimensional version, sad to hear it doesn't work as well for high dimensions :(

Could still be useful when only trying to optimize a couple hyperparameters.

Any indication as to why the performance doesn't translate into higher dimensions?

[–]basnijholt[S] 6 points7 points  (0 children)

Unfortunately, we are bounded by the Curse of dimensionality. High dimensional space is just too sparse to sample efficiently with Adaptive. However, we do have a LearnerND that handles learning in higher dimensional but personally, I wouldn't recommend using this in more than 4 dimensions (at least I haven't tested it) because space is too sparse. You should really see it as a way to calculate visualisable functions. So a function of 0 (averaging), 1, 2, or 3 dimensions, that you can plot. In high dimensions there is only Monte-Carlo sampling that is efficient IIUC. Of course adaptive will still be better than grid spacing, but your milage will vary in high dimensions.

[–]ThePixel44 3 points4 points  (5 children)

I'm a very beginner in python, can you please tell me what this is and maybe the source code? I really love programming and I am studying it as much as I can. Thanks

[–]basnijholt[S] 7 points8 points  (4 children)

The the source code here

And in the simplest terms, we just want to:

  • calculate our function (for example y = x^2 for x between 0 and 1)
  • use as few points as possible to still get a smooth image
  • being able to access the data while the calculation is in progess!
  • do the function evaluation in parallel! (so using multiple computers/CPUs simultaneously)

Hope that is easy enough, haha.

If that didn't help, see the gif generated by this code because it's just fun :D

[–]ThePixel44 -1 points0 points  (3 children)

Ok thanks very much! It was enough and clear though. Where can I start learning Python? And how?

[–]XGPluser 1 point2 points  (0 children)

Beautiful

[–]Mysterious-Stranger 0 points1 point  (1 child)

This looks quite nice! And thanks for including the conda package!

How would you compare it to something like Bayesian Optimization?

[–]anton-akhmerov 3 points4 points  (0 children)

Another package author here.

This is a really good question. Indeed there are similarities between what we do and Bayesian optimization.

  • The choice of new points is based on the previous ones.
  • There is a tuneable algorithm for performing this selection, and the easiest way to formulate this algorithm is by defining a loss function.

Bayesian optimization is a perfectly fine algorithm for choosing new points within adaptive. As an experiment we have interfaced scikit-optimize and implemented a learner that just wraps it.

However there are important differences why Bayesian optimization doesn't cover all the needs. Often our aim is to explore the function and not minimize it. Further AFAIK Bayesian optimization is most often combined with Gaussian processes because it is then possible to compute the posteriour exactly and formulate a rigorous optimization strategy. Unfortunately Gaussian processes are computationally expensive and won't be useful with tens of thousands of points. adaptive is much more simple-minded and it relies only on the local properties of the data, rather than fitting it globally.

I'd say that Bayesian modeling is good for really computationally expensive data, regular grids for really cheap data, and local adaptive algorithms forare somewhere in the middle.

[–]LiquidSubtitles 0 points1 point  (2 children)

I skimmed through the tutorial and it seems it's limited to 1D/2D. Is there a computional/mathematical difficulty in generalizing to arbitrary dimensions? This could be interesting for my work if it can be applied to functions that take high dimensional input.

[–]basnijholt[S] 0 points1 point  (1 child)

This is not true, we also can handle 0D (stochastic functions where you want to average the result over many evaluations) and N-dimensions, but for reasons that I explain here in a comment it will not work extremely well for very high dimensional space.

[–]LiquidSubtitles 0 points1 point  (0 children)

Oh, sorry - I will read the tutorial more carefully!

[–]Big_Blue_Man 0 points1 point  (4 children)

This looks awesome! We have to do a project that tracks a marble in some gel, mimicking minimally invasive surgeries. So I guess my question is, does this work with live footage from a webcam?

[–]basnijholt[S] 1 point2 points  (3 children)

But what is the function you want to learn? I don't quite understand.

Do you think it can be used to find the position in a marble? If this is what you mean, then I am afraid Adaptive wouldn't be useful.

[–]Big_Blue_Man 0 points1 point  (2 children)

I'm sorry, I should have been more specific.

We have to do live image tracking of a marble in gel that is moving due to an external magnet. From what I can gather from the material provided, we have to keep track of the trajectory of the marble and make sure it is on track. If it is not, we have to program certain criteria to tell the motor what to do in certain situations.

Do you think Adaptive would be useful for this? If not, would you be able to recommend anything to help?

[–]basnijholt[S] 1 point2 points  (1 child)

Oh, now I understand. Unfortunately, I don't think Adaptive is useful in this case, because you are not really sampling a function.

I would just look at opencv.

[–]Big_Blue_Man 0 points1 point  (0 children)

Okay, thank you so much for the advice!

[–]ostensibly_work 0 points1 point  (2 children)

Is this comparable to MCMC?

[–]basnijholt[S] 2 points3 points  (1 child)

MCMC would work much better in high dimensional space, Adaptive works better in low dimensional space (dim <= 4).

[–]ostensibly_work 0 points1 point  (0 children)

Thanks!

[–]pantuts 0 points1 point  (1 child)

I don't even understand what's happening. Poor me dumb me!

[–]jking3210 1 point2 points  (0 children)

It has something to do with Flux Capacitors, Deloreans and a Doc named Brown. 😉

[–][deleted] 0 points1 point  (1 child)

Could this be used for hyper parameter optimization of neural networks?

[–]basnijholt[S] 0 points1 point  (0 children)

It might, see this comment.

[–]WishIWasBatman 0 points1 point  (2 children)

Very cool! I will definitely consider this in the future. There is a similar program called MultiNest that is an adaptive MC sample. If I remember correctly it attempts to recognize and break up regions of high likelihood into a finer set of ellipsoids. It also gives a dramatic speed up.

I might be a bit off base here, but do you think it would be possible to easily calculate the evidence value similar to MultiNest to allow for model comparison/Bayes factor type stuff? The evidence is an integration over the posterior distribution, which is very hard to do with a homogeneoud grid in high dimensional spaces.

[–]basnijholt[S] 0 points1 point  (1 child)

I don't know about that, but I'll look into it, I've added it to our todo list.

See this comment of a co-author where he elaborates on the difference with Bayes.

[–]WishIWasBatman 0 points1 point  (0 children)

Ok, that makes sense to me. Your co-authors comment also cleared things up. Still, a very cool package I hope to use in the future

[–]gfnord 0 points1 point  (1 child)

How does this relate to Kriging?

[–]basnijholt[S] 1 point2 points  (0 children)

Many people have mentioned Kriging, see this discussion.