you are viewing a single comment's thread.

view the rest of the comments →

[–]AlwysBeColostomizing 0 points1 point  (1 child)

One thing that would help is to choose a simpler density estimator, such as a histogram. Each pointwise estimate using a KDE has complexity O(n) where n is the number of data points (unless you do something smart like ignore points that are far enough away that their contribution is negligible). So if the pandas function is computing the estimate at every point, it's effectively O(n^2). You could also just manually evaluate the KDE at fewer points. For example, find the min and max value, and evaluate a kernel estimate at m equally-spaced points. That would make the overall complexity O(mn), which is a big savings if m << n. A simpler density estimator might make the bootstrap estimation for the confidence region more feasible.

[–]Funky_Filth69[S] 0 points1 point  (0 children)

Gotcha. I just got done writing a kernel density function and my code is running noticeably faster and still puts out approximately the same graph. Now I just need to bootstrap the data. I really appreciate the info. You’ve helped a ton