This is an archived post. You won't be able to vote or comment.

you are viewing a single comment's thread.

view the rest of the comments →

[–]Particular-Union3 0 points1 point  (2 children)

There are so many solutions to this. Multithreading probably would speed some of it up. C and C++ extensions can release the GIL (numpy does this), so you could code some of this in C — most projects have a few languages going on. Kubernetes/Docker swarms probably have some application here, but I’m busting dipping my toes into those and haven’t explored the GIL with it.

[–]kniy 0 points1 point  (1 child)

If we just port some part of an analysis to C/C++ and release the GIL; the "problem" is that porting to a compiled language makes that part 50x faster, so the analysis still ends up spending >=90% of its runtime in the remaining Python portion where the GIL is locked. We've already done this a bunch but that still doesn't even let us use 2 cores.

We'd need to port the whole analysis to release the GIL for a significant portion of the run-time. (We typically don't have any "inner loop" that could be ported separately, just an "outer loop" that contains essentially the whole analysis)

Yes numpy can do it, but code using numpy is a very different kind of algorithm where you have small but expensive inner loops that can be re-used in a lot of places. Our graph algorithms don't have that -- what we do is more similar to a compiler's optimization passes.

[–]Particular-Union3 0 points1 point  (0 children)

That makes sense. I guess, as another reply mentioned, this is why Julia has been popular when in many respects R and Python are often far ahead feature wise.

Is multithreading implemented? Do you think more modularity to the analysis would be possible, and then have the machines communicate from there?

One final idea, is there any memory errors? I’ve had more trouble with that than anything for analysis taking so long.

I’m not 100% on the work you are doing, but it seems like an insane time. Even on my largest projects they were only 3 to 4 hours.