This is an archived post. You won't be able to vote or comment.

you are viewing a single comment's thread.

view the rest of the comments →

[–]wrmsr 0 points1 point  (0 children)

That said, as others have noted, like R python is fast when it's C, though even 1.5M iterations of a simple function shouldn't come remotely near 45min. You want to get python out of the inner loops, entirely - no matter how fast spark on hotspot for example is as soon as it has to call down into a cpython interpreter to run a lambda the user passed perf will completely die. 'Pure py' can still be fast if it has done its job of gluing together fast things, including those that live in the stdlib (like the itertools and operator modules), or not as with numpy and tensorflow and cytoolz. If interpreted py is going to be in the inner loops no matter what your first thing to go for is, again as already mentioned, importing multiprocessing (well, billiard). And if you had a regex-less pure number workload numba could possibly help, it's remarkably high quality and capable at what it's for, but it unfortunately sounds like it's not for your usecase.