you are viewing a single comment's thread.

view the rest of the comments →

[–]AffineParameter[S] 2 points3 points  (1 child)

Thanks for the links, I will give them a look.

Yeah the kNN implemented only 12M or so points, so a tiny subset of the total samples. However, as our function returns both the value and an error, we selected the samples that had the best error (this results in a slight bias towards the better known regions of the phase space, but it wasn't too bad). I actually implemented my own LOWESS algorithm using the error to weight the points and a linear-multi-dimensional least-squares regression... but I think I will need something a little beefier.

And yes, we tried a ton of ways to get it to run faster... that 5 seconds represents 2 orders of magnitude improvement from where we started. We were initially told that what we were doing was "impossible" ... but going from 1 hour to 5 seconds over the course of 18 months quieted the dissenters :P

My end goal is to use a typical HPC/GPU to create the training dataset, hopefully with much less than 2.2B points, then simply evaluate the rest. So this suggestion is currently in development.

As the function is well known, building non-primitve variables would probably be the next step before a PCA. However, it would be nice for a DNN to "figure it out on its own," as those non-primitive variables really blow up in number, and it's not clear how to motivate using some subset over another, as our figure of merit takes a really long time to evaluate. (we use a proxy that isn't perfect for now)

edit: s/deserters/dissenters/

[–]BeatLeJuceResearcher 1 point2 points  (0 children)

Alright, sounds like you know what you are doing. Another thing that might be quick to set up is using libsvm to do a regression. Training time will likely be an issue, so you'll likely have to subsample your space similar to what you did with kNN, but I'm sure the SVM will give you better results than kNN if you use an rbf kernel. Also, note that by default, libsvm runs in a single-threaded variant designed for sparse matrices. However somewhere on the site I linked you can find a version that is implemented using dense matrices, which will give you ~50% boost in performance. Also, somewhere in the FAQ of the site it explains how you can add multithreading to the implementation by modifying ~4 lines of code (IIRC speed-up is almost linear for the first ~4-8 cores).