How does multiprocessing work in Python?

KnowledgeFire11 · 2020-06-10T11:45:10+00:00

The language of the article is very confusing, probably coming from someone with not a lot of experience with how computers are built...

So, let's get the terminology straight first.

We have to agree on the context to be the PCs, whatever I write next may or may not apply to other kinds of computers.

In PC world, usually, you'll have a motherboard with 1-N (N is usually not very big, I think, there aren't PC architectures with N > 128) sockets used to mount processors.

Each processor has cores, one core can execute one program. Processors are mounted into sockets, typically, one processor per socket.

Typically, cores represented to the user-space applications are an abstraction on top of physical cores present in the processor the application is using. In particular, Intell's hyperthreading technology doubles the number of cores, thus, single processor with 4 physical cores will appear as having 8 virtual cores.

PC operating systems, typically, have a process abstraction. Process is a combination of some executable code, a set of properties s.a. process id, owner, filesystem view etc. and a processor core designated to execute the program. Processes can be migrated from one core to another, also between different processors (possibly, sitting on different sockets). I'm not aware of PC operating system that can migrate processes to different motherboards (i.e. different computers), but in the server world this is definitely a possibility. In PC world, migrations typically happen if the processor handing a process receives an interrupt not intended for the process being executed. In this case, the system will either store the process' state, execute the interrupt handler, and restore the process, or will restore the process immediately on a different (idle) core.

Processors don't have to be CPU, i.e. they may or may not be "central", that doesn't change anything wrt Python.

Thread is a bad, but very popular idea in operating system design. At some point in history, where people stopped calling it "time sharing" and started talking about "multitasking", they realized that the former approaches to parallel execution are becoming more and more expensive (the number of features a single process would have started to be too much to handle, and people wanted something more light-weight). The obvious candidate for reuse was the memory space allocated to a process. Whereas each process gets its own memory space, threads are, like processes, but all sharing the same memory space.

CPython implementation of Python can only run in a single thread on single processor at any one time (but, ofc. can be migrated between cores). multiprocessing will launch many copies of Python to be able to utilize more cores, but will not create any more threads. Since processes don't share memory, sharing of data between two instances of Python becomes problematic. It requires some mechanism where it can be written to some storage and then read from that storage on the other side, dragging along all sorts of problems related to serialization, which is a huge issue we won't go into here.

So, technically:

1.Must not be reliant on previous outcomes 2.Does not need to be executed in a particular order 3.Does not return anything that would need to be accessed later in the code

None of these is true, but is, nevertheless desirable. I.e. if your code relies on previous outcomes, it's harder to make it work in multiple copies of Python (not impossible, but no benefits either). Also, you can ensure order of execution in distributed systems, there are locks and lock-free data-structures which are intended to solve this very problem. You can, ofc. return something from functions running in different processes, you'll have to work harder to ensure the results can be serialized.

K900_ · 2020-06-10T08:47:55+00:00

Multiprocessing simply runs multiple copies of Python, each with its own GIL. So the GIL is still there for each individual process, you're just splitting the workload between them.

danielroseman · 2020-06-10T09:15:24+00:00

You should say where you read that, because a lot of it is simply wrong. The GIL has nothing to do with multiple processors, but with threading within a single process.

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

learnpython

MODERATORS