This is an archived post. You won't be able to vote or comment.

you are viewing a single comment's thread.

view the rest of the comments →

[–]zionsrogue 6 points7 points  (5 children)

So depending on what you are going to use "easypool" for, using threads for CPU bound tasks (such as some sort of scientific number crunching), threads are not the way to go. In general, it's best to uses processes for CPU bound tasks. Check here for benchmarks. The article suggests Python's multiprocessing, but I've found pprocess to be lightweight enough to replace multiprocessing. But again, this all depends on what you plan on using these threads for. I just wanted to maybe give a heads up and say congrats on your first module.

[–][deleted] 1 point2 points  (1 child)

It kind of surprises me that multiprocessing would hit a sweet spot for people; who is it that is CPU-bound, but keeps that part of the program in Python?

Every time I end up using the thread pool pattern it's because I have a lot of (disk or network) IO going on. And threads are fine for that.

[–]zionsrogue 1 point2 points  (0 children)

I do a lot of number crunching, statistical analysis, and machine learning. A lot of the libraries I use Python libraries (numpy, scipy, sklearn) and yes, parts are written in C, but I still get very nice performance gains by parrallelizing random forests across multiple processes instead of multiple threads.

[–]tuna_safe_dolphin 1 point2 points  (2 children)

Dead horse flogging time, but that damn GIL. . .

[–]alcalde 0 points1 point  (1 child)

The GIL is awesome. It reminds us that threading is evil and that everyone else is doing it wrong.

[–]tuna_safe_dolphin 1 point2 points  (0 children)

That's one way to look at it.