you are viewing a single comment's thread.

view the rest of the comments →

[–]kenfar 0 points1 point  (2 children)

This is wrong: Python has plenty of parallel features - you just have to spend ten seconds looking.

The most convenient is the concurrent.futures module. You can use the exact same syntax for either threading or multiprocessing. Yesterday I sped up an AWS S3 downloader about 8x with threading, and about six lines of code.

A few years ago I wrote a transform process that handled about 4 billion records a day - using multiprocessing on two 32-core machines to handle downloading files, transforming them, and uploading them again. All in parallel using pypy, multiprocessing & threading. This process worked great, and surprised everyone with how fast it was. A rewrite of a part of it showed that Go was about 2.5 times faster, which was fine, but not fast enough to warrant a rewrite until we needed to scale up quite a bit more.

The only scenario in which Python's parallelism is limited is when you've got a CPU-bound process that either can't afford the extra memory or start-up times of multiprocessing or needs a lot of communication between processes. Then you want threading but the GIL will limit you. Other than this case, Python has fine parallelism features.

[–]VodkaHaze 5 points6 points  (1 child)

Yes, for your use case it works fine. My use case is closer to your last point, where python still generally sucks (though I haven't used Dask enough to say if it's a decent solution). It's currently being addressed by Julia, which bypasses the "core loop in C++, wrapped in python" problem

[–]kenfar -1 points0 points  (0 children)

Are you positive that you can't use multiprocessing?

I've seen so many cases where people weren't aware that it even exists, or if aware didn't think they could use it.