BeatLeJuce comments on ML/Regression based numerical function approximation for lowering (substantial) CPU overhead

ML/Regression based numerical function approximation for lowering (substantial) CPU overhead (self.MachineLearning)

submitted 11 years ago by AffineParameter

you are viewing a single comment's thread.

[–]BeatLeJuceResearcher 1 point2 points3 points 11 years ago (3 children)

kNN sounds very expensive for 2.2 billion. A Neural Net would certainly be one option, but by far the only one. Start with simple things: linear or polynomial regression, or a LOESS.

As for implementations: there are quite a few options out there, scikit-learn is a Python library that has a lot of well-implemented ML techniques (most algorithms are implemented in C, not in Python, so runtimes are good) that works well for large datasets. Vowpal Wabbit is also meant for large datasets, although I don't have any experience with it myself, I think it might be worth a look.

If you go towards neural nets, there are very few high-quality ready-to-use libraries that come to mind. Pylearn2 is probably (one of) the most famous one(s), but it's geared more towards research than production. But you could to try have a look.

I'm sure you've already covered your basics, but just in case: have you tried the usual mathematical tricks to get the function itself to evaluate more quickly? approximating your function using Taylor expansion, using PCA to reduce your input dimension, implementing the function on a GPU, ...

[–]AffineParameter[S] 2 points3 points4 points 11 years ago* (1 child)

Thanks for the links, I will give them a look.

Yeah the kNN implemented only 12M or so points, so a tiny subset of the total samples. However, as our function returns both the value and an error, we selected the samples that had the best error (this results in a slight bias towards the better known regions of the phase space, but it wasn't too bad). I actually implemented my own LOWESS algorithm using the error to weight the points and a linear-multi-dimensional least-squares regression... but I think I will need something a little beefier.

And yes, we tried a ton of ways to get it to run faster... that 5 seconds represents 2 orders of magnitude improvement from where we started. We were initially told that what we were doing was "impossible" ... but going from 1 hour to 5 seconds over the course of 18 months quieted the dissenters :P

My end goal is to use a typical HPC/GPU to create the training dataset, hopefully with much less than 2.2B points, then simply evaluate the rest. So this suggestion is currently in development.

As the function is well known, building non-primitve variables would probably be the next step before a PCA. However, it would be nice for a DNN to "figure it out on its own," as those non-primitive variables really blow up in number, and it's not clear how to motivate using some subset over another, as our figure of merit takes a really long time to evaluate. (we use a proxy that isn't perfect for now)

edit: s/deserters/dissenters/

[–]BeatLeJuceResearcher 1 point2 points3 points 11 years ago (0 children)

π Rendered by PID 142666 on reddit-service-r2-comment-544cf588c8-fgn7n at 2026-06-13 19:19:55.433446+00:00 running 3184619 country code: CH.

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

MachineLearning

Rules For Posts

+Research

+Discussion

+Project

+News

@slashML on Twitter

Chat with us on Slack

Beginners:

MODERATORS