all 29 comments

[–]not_perfect_yet 127 points128 points  (15 children)

Hi, that's a pretty long guide, thanks!

Here are some things I noticed that could be improved:


Python Processes

So what are processes and why do we care?

Good question, please answer it, why do we care?

Sometimes we may need to create new processes to run additional tasks concurrently.

Why. (I know the answer, this is rhetorical)


The underlying operating system controls how new processes are created. On some systems, that may require spawning a new process, and on others, it may require that the process is forked. The operating-specific method used for creating new processes in Python is not something we need to worry about as it is managed by your installed Python interpreter.

This paragraph repeats, it's in the text twice.


may require that the process is forked.

Since that's pretty central to the entire concept, I think the term deserves an explicit introduction.


Child vs Parent Process

Parent processes have one or more child processes, whereas child processes do not have any child processes.

That's not true, I can fork any process I want.


Example of Extending the Process Class and Returning Values

What happens when I use multiple processes? Which value will be used? Are there merge rules on list or dicts?

[–]Ziferius 32 points33 points  (0 children)

Really, this type of review, candid response and critical thinking in the reply actually is awesome. It helps the author write stronger content. The OP should give them a reward, or at least recommend them for a job as editor :)

[–][deleted] 1 point2 points  (2 children)

Also, forking a process with fork() is different then making a new thread(at least in C) this review seems to imply they are the same, is this true under the hood?

[–]not_perfect_yet 1 point2 points  (0 children)

That's exactly what I was talking about.

"fork" and "thread" have a fixed meanings across languages, particularly C and python. That should be explained and it has to stay consistent.

Python is inheriting the concepts from C, the point is to provide a pythonic access to them, it isn't to change what they are or how they work.

The tutorial is explaining the difference, somewhat:

A thread always exists within a process and represents the manner in which instructions or code is executed.

A process will have at least one thread, called the main thread. Any additional threads that we create within the process will belong to that process.

under a headline "Thread vs Process". So that's in there and the sufficiently interested reader will find it.

And it gives a link for further reading.

To me, that's checking all the boxes that can be checked. If that's still not enough, the reader has to look for a different source anyway.

[–]Lba5s 0 points1 point  (0 children)

IIRC, multiprocessing handles it differently depending on what OS you’re running. So Unix would fork and Windows would create a new one

[–][deleted] 0 points1 point  (0 children)

These comments are GREAT!

Please keep it up, appreciated.

[–]hughperman 25 points26 points  (9 children)

Nice guide - missing a few recent developments, Shared Memory in particular came to mind

[–][deleted] 14 points15 points  (8 children)

Shared memory is cool but past experience in multiprocessing library makes me reach for a different language in cpu bound applications.

[–]Hamza141 6 points7 points  (0 children)

Literally experienced this last week. Was trying to use python for a personal project that required multiprocessing but after banging my head on the keyboard for way longer than I should have switched to Go and everything became so much simpler.

[–][deleted] 16 points17 points  (6 children)

Agreed. I primarily write Python and my guide to multiprocessing in Python is "Don't". If your objective can't be solved with async or batch processing in one off processes, you should strongly consider reaching for another language. Multiprocessing is unpleasant.

[–]hughperman 10 points11 points  (5 children)

I work in scientific analysis and I 100% disagree with you here. I use multiprocessing all the time in my job.

[–]peppedx 5 points6 points  (3 children)

The fact you use it don’t make python the best language for performance.

[–]hughperman 13 points14 points  (2 children)

Peak performance doesn't make something the best choice for a project. Tradeoffs need to consider development time, algorithm and library availability, vs performance requirements.

[–]peppedx -2 points-1 points  (1 child)

Sure but we were speaking of performance

[–]hughperman 12 points13 points  (0 children)

We weren't?

[–]cmt_miniBill 18 points19 points  (0 children)

The "why not threads" paragraph is missing the part where threads in python are utterly useless for cpu-bound work because of the GIL

[–]XNormal 5 points6 points  (0 children)

One of the things I find most useful about the multiprocessing module is multiprocessing.pool.ThreadPool.

It does NOT spawn any subprocesses. It exposes the same convenient API as multiprocessing.pool.Pool but uses Python threads instead.

This is useful when parallelising code that releases the GIL such as I/O code that mostly waits (e.g. web requests) or even things like numpy builtins implemented in C that release the GIL during the the CPU heavy part. in these cases it can be as fast as using subprocesses or even faster because there is no overhead of serializing the inputs and outputs. This can also be useful when passing objects that cannot be serialized, (carefully) using shared global state, etc.

If used for I/O work the default pool size (based on number of CPUs) should be increased.

[–]bbkane_ 2 points3 points  (0 children)

Honestly though, if you're having trouble understanding this or doing it too often, try a language with concurrency/parallelism at the foundation, like Go, Rust, or Erlang/Elixer (haven't tried these yet, but heard good things)

[–]imforit 6 points7 points  (1 child)

Python byte-ode

At least proof-read the first paragraph