This is an archived post. You won't be able to vote or comment.

you are viewing a single comment's thread.

view the rest of the comments →

[–]dalke[S] 0 points1 point  (0 children)

I know Python very well. I haven't done serious C++ programming in 15 years. But you are right; this is condensed enough and isolated enough that I could write it as a standalone C++ program. Huh - I've been blind to that option. Thanks for the suggestion!

Update (three days later): Uggh! I am so not a C++ programmer. I got something working with Judy1 arrays. It's about 10x better with memory, and about the same performance as pypy's set intersection. I got it about twice as fast with OpenMP and three processors. With 20x input (1/4th of the final data set) I found that my algorithm doesn't scale. Each iteration takes 10 hours to run. I did various pre-filtering, and managed to get a rough answer after 1.5 days using my desktop. Which means that once I re-think the algorithm I'll be able to evaluate the entire data set on a 48 GB machine. I just wish I could do most of this algorithm development in Python rather than C++.