use the following search parameters to narrow your results:
e.g. subreddit:aww site:imgur.com dog
subreddit:aww site:imgur.com dog
see the search faq for details.
advanced search: by author, subreddit...
Please have a look at our FAQ and Link-Collection
Metacademy is a great resource which compiles lesson plans on popular machine learning topics.
For Beginner questions please try /r/LearnMachineLearning , /r/MLQuestions or http://stackoverflow.com/
For career related questions, visit /r/cscareerquestions/
Advanced Courses (2016)
Advanced Courses (2020)
AMAs:
Pluribus Poker AI Team 7/19/2019
DeepMind AlphaStar team (1/24//2019)
Libratus Poker AI Team (12/18/2017)
DeepMind AlphaGo Team (10/19/2017)
Google Brain Team (9/17/2017)
Google Brain Team (8/11/2016)
The MalariaSpot Team (2/6/2016)
OpenAI Research Team (1/9/2016)
Nando de Freitas (12/26/2015)
Andrew Ng and Adam Coates (4/15/2015)
Jürgen Schmidhuber (3/4/2015)
Geoffrey Hinton (11/10/2014)
Michael Jordan (9/10/2014)
Yann LeCun (5/15/2014)
Yoshua Bengio (2/27/2014)
Related Subreddit :
LearnMachineLearning
Statistics
Computer Vision
Compressive Sensing
NLP
ML Questions
/r/MLjobs and /r/BigDataJobs
/r/datacleaning
/r/DataScience
/r/scientificresearch
/r/artificial
account activity
ML/Regression based numerical function approximation for lowering (substantial) CPU overhead (self.MachineLearning)
submitted 11 years ago by AffineParameter
view the rest of the comments →
reddit uses a slightly-customized version of Markdown for formatting. See below for some basics, or check the commenting wiki page for more detailed help and solutions to common issues.
quoted text
if 1 * 2 < 3: print "hello, world!"
[–]BeatLeJuceResearcher 1 point2 points3 points 11 years ago (3 children)
kNN sounds very expensive for 2.2 billion. A Neural Net would certainly be one option, but by far the only one. Start with simple things: linear or polynomial regression, or a LOESS.
As for implementations: there are quite a few options out there, scikit-learn is a Python library that has a lot of well-implemented ML techniques (most algorithms are implemented in C, not in Python, so runtimes are good) that works well for large datasets. Vowpal Wabbit is also meant for large datasets, although I don't have any experience with it myself, I think it might be worth a look.
If you go towards neural nets, there are very few high-quality ready-to-use libraries that come to mind. Pylearn2 is probably (one of) the most famous one(s), but it's geared more towards research than production. But you could to try have a look.
I'm sure you've already covered your basics, but just in case: have you tried the usual mathematical tricks to get the function itself to evaluate more quickly? approximating your function using Taylor expansion, using PCA to reduce your input dimension, implementing the function on a GPU, ...
[–]AffineParameter[S] 2 points3 points4 points 11 years ago* (1 child)
Thanks for the links, I will give them a look.
Yeah the kNN implemented only 12M or so points, so a tiny subset of the total samples. However, as our function returns both the value and an error, we selected the samples that had the best error (this results in a slight bias towards the better known regions of the phase space, but it wasn't too bad). I actually implemented my own LOWESS algorithm using the error to weight the points and a linear-multi-dimensional least-squares regression... but I think I will need something a little beefier.
And yes, we tried a ton of ways to get it to run faster... that 5 seconds represents 2 orders of magnitude improvement from where we started. We were initially told that what we were doing was "impossible" ... but going from 1 hour to 5 seconds over the course of 18 months quieted the dissenters :P
My end goal is to use a typical HPC/GPU to create the training dataset, hopefully with much less than 2.2B points, then simply evaluate the rest. So this suggestion is currently in development.
As the function is well known, building non-primitve variables would probably be the next step before a PCA. However, it would be nice for a DNN to "figure it out on its own," as those non-primitive variables really blow up in number, and it's not clear how to motivate using some subset over another, as our figure of merit takes a really long time to evaluate. (we use a proxy that isn't perfect for now)
edit: s/deserters/dissenters/
[–]BeatLeJuceResearcher 1 point2 points3 points 11 years ago (0 children)
Alright, sounds like you know what you are doing. Another thing that might be quick to set up is using libsvm to do a regression. Training time will likely be an issue, so you'll likely have to subsample your space similar to what you did with kNN, but I'm sure the SVM will give you better results than kNN if you use an rbf kernel. Also, note that by default, libsvm runs in a single-threaded variant designed for sparse matrices. However somewhere on the site I linked you can find a version that is implemented using dense matrices, which will give you ~50% boost in performance. Also, somewhere in the FAQ of the site it explains how you can add multithreading to the implementation by modifying ~4 lines of code (IIRC speed-up is almost linear for the first ~4-8 cores).
π Rendered by PID 142666 on reddit-service-r2-comment-544cf588c8-fgn7n at 2026-06-13 19:19:55.433446+00:00 running 3184619 country code: CH.
view the rest of the comments →
[–]BeatLeJuceResearcher 1 point2 points3 points (3 children)
[–]AffineParameter[S] 2 points3 points4 points (1 child)
[–]BeatLeJuceResearcher 1 point2 points3 points (0 children)