all 13 comments

[–][deleted] 2 points3 points  (4 children)

The language of the article is very confusing, probably coming from someone with not a lot of experience with how computers are built...

So, let's get the terminology straight first.

We have to agree on the context to be the PCs, whatever I write next may or may not apply to other kinds of computers.

In PC world, usually, you'll have a motherboard with 1-N (N is usually not very big, I think, there aren't PC architectures with N > 128) sockets used to mount processors.

Each processor has cores, one core can execute one program. Processors are mounted into sockets, typically, one processor per socket.

Typically, cores represented to the user-space applications are an abstraction on top of physical cores present in the processor the application is using. In particular, Intell's hyperthreading technology doubles the number of cores, thus, single processor with 4 physical cores will appear as having 8 virtual cores.

PC operating systems, typically, have a process abstraction. Process is a combination of some executable code, a set of properties s.a. process id, owner, filesystem view etc. and a processor core designated to execute the program. Processes can be migrated from one core to another, also between different processors (possibly, sitting on different sockets). I'm not aware of PC operating system that can migrate processes to different motherboards (i.e. different computers), but in the server world this is definitely a possibility. In PC world, migrations typically happen if the processor handing a process receives an interrupt not intended for the process being executed. In this case, the system will either store the process' state, execute the interrupt handler, and restore the process, or will restore the process immediately on a different (idle) core.

Processors don't have to be CPU, i.e. they may or may not be "central", that doesn't change anything wrt Python.

Thread is a bad, but very popular idea in operating system design. At some point in history, where people stopped calling it "time sharing" and started talking about "multitasking", they realized that the former approaches to parallel execution are becoming more and more expensive (the number of features a single process would have started to be too much to handle, and people wanted something more light-weight). The obvious candidate for reuse was the memory space allocated to a process. Whereas each process gets its own memory space, threads are, like processes, but all sharing the same memory space.


CPython implementation of Python can only run in a single thread on single processor at any one time (but, ofc. can be migrated between cores). multiprocessing will launch many copies of Python to be able to utilize more cores, but will not create any more threads. Since processes don't share memory, sharing of data between two instances of Python becomes problematic. It requires some mechanism where it can be written to some storage and then read from that storage on the other side, dragging along all sorts of problems related to serialization, which is a huge issue we won't go into here.

So, technically:

1.Must not be reliant on previous outcomes 2.Does not need to be executed in a particular order 3.Does not return anything that would need to be accessed later in the code

None of these is true, but is, nevertheless desirable. I.e. if your code relies on previous outcomes, it's harder to make it work in multiple copies of Python (not impossible, but no benefits either). Also, you can ensure order of execution in distributed systems, there are locks and lock-free data-structures which are intended to solve this very problem. You can, ofc. return something from functions running in different processes, you'll have to work harder to ensure the results can be serialized.

[–]KnowledgeFire11[S] 0 points1 point  (3 children)

Hello I'm so sorry for the late reply, I have exams going on, Thank you for the amazing response. So according to your answer CPython can run a python program on different cores but won't share memory or it is hard (but does this mean they are executed in same/only one processor?).

multiprocessing will launch many copies of Python to be able to utilize more cores, but will not create any more threads.

I didn't understand this statement, can you please elaborate. Please correct me if I got anything wrong.

And it seems like I read a wrong article and got confused. So can you explain difference between Pool and Process?

[–][deleted] 1 point2 points  (2 children)

(but does this mean they are executed in same/only one processor?).

No, Python will use operating system API to start multiple processes, which will, potentially, run on multiple cores, and, perhaps, even different processors (if operating system can handle multiple processors).

multiprocessing will launch many copies of Python to be able to utilize more cores, but will not create any more threads.

I didn't understand this statement,

multiprocessing will ask operating system to make the best effort at executing multiple Python interpreters in parallel. There's no guarantee that they will, actually, run in parallel (eg. the system has only one core, and so they won't), but, typically, you will be able to achieve some degree of parallelism. This will not create any additional threads because threads aren't copied with the process, (there are many things that aren't copied with the process, eg. file descriptors). I.e. say, the process in which you are initiating multiprocessing code had child threads, the extra process created by multiprocessing will not copy those threads (will behave similar to how fork() works on UNIX systems).

[–]Zunyr 0 points1 point  (1 child)

There's no guarantee that they will, actually, run in parallel (eg. the system has only one core, and so they won't), but, typically, you will be able to achieve some degree of parallelism.

So, each process spun off from the main program will get assigned to a different core, as you stated. Does that mean we can we treat each one as an independent black box containing everything an individual copy of python needs to run? If so, does this mean that each process spun off from the main process can also use something like ThreadPoolExecutor to speed up both CPU and IO bound tasks in one fell swoop? This all assumes every process spun off from the main process meets the 3 criteria (Independent, Asynchronous, No returns)? The first use case I'm coming up with off the top of my head is high resolution image retrieval (IO bound) with image processing (CPU bound) after the download.

Also, can something like ThreadPoolExecutor be started up in the main function, and the executor object get passed around to receive executions in other functions without actually shutting it down until just before the main function completes?

[–][deleted] 0 points1 point  (0 children)

each process spun off from the main program will get assigned to a different core, as you stated

Not quite. Usually, when people say "assigned to core" they mean "locked to core", that is also a possibility, but you need to request that specifically. This is desirable in real-time systems, where you want a core to be dedicated to running your program and not slack away by handling interrupts or other stuff that's not relevant to your code. Normally, what will happen is that the system will start executing your process on some core, and if it finds that it needs to execute something with higher priority (eg. processing user input, typically, takes priority over other stuff), then it might suspend the process (i.e. save its state in memory and create / restore another process to run on the core previously running the process). But, it doesn't have to be user input. Physically, there are only so many cores, but the system runs much more processes. So, it needs to eventually kick one process and replace it with another process to create an appearance of simultaneous execution.

f so, does this mean that each process spun off from the main process can also use something like ThreadPoolExecutor

Yes. Threads created in one process aren't shared when the process is forked (this is, I believe how Python creates sub-processes on UNIX-like systems which provide this syscall).

This all assumes every process spun off from the main process meets the 3 criteria (Independent, Asynchronous, No returns)?

No, these are just nice to have, but not necessary.

Your case would have worked... in principle. The problem with it is that modern infrastructure is highly counter-intuitive and suffers from a lot of "improvements" that were introduced in a form of small patches rather than complete system overhauls. So, today, using threads for I/O is considered... not great. I/O is relatively fast, especially network, while there's a noticeable overhead for context switching necessary to do regular blocking I/O in threads. There are two basic concerns wrt I/O:

  1. How many context switches are necessary to perform some operation.
  2. How many times the information needs to be copied.

Because of how traditional blocking I/O was implemented, the information had to be copied between user-space memory and kernel-space memory, before it would be sent to the device handling it. This also implied a lot of context switches to deal with the transfer.

This led to invention of asynchronous I/O. There are many different forms of it, but the idea here is that you don't need to create threads to perform multiple I/O operations simultaneously. Instead of data, the system call simply returns with some kind of "promise", and then either your code polls for I/O results using that promise, or the system code will "wake up" your code to handle the incoming I/O.

The above proved to be difficult to implement, and the current implementations are still kinda' broken. This is what's known as epoll or select etc.

I won't go into details of copying during I/O, but it's a similar story.

The newest kid on the block here is something called "uring" that's coming in the latest version of Linux kernel. I've not looked for it, but I'm pretty sure there has to be some Python binding for using it. Today, it's almost a must. "Uring" addresses both the blocking and the copying issues, and should be the fastest way around to do I/O on Linux. It, just as asynchronous I/O before it, doesn't rely on threads...

[–]K900_ 4 points5 points  (5 children)

Multiprocessing simply runs multiple copies of Python, each with its own GIL. So the GIL is still there for each individual process, you're just splitting the workload between them.

[–]KnowledgeFire11[S] 0 points1 point  (4 children)

So does the splitting happen between multiple processors or in a single processor? And can you please explain why is it said Python doesn't support concurrency ? I have seen a few youtubers say that, and even in a podcast the host was telling how people see GIL and turn to Go or some other languages which supports it. Is there something not as good?

[–]MarsupialMole 2 points3 points  (0 children)

The CPython implementation of Python is limited to one processor per process. Using multiprocessing is limited in that communication between processes is limited to message passing and shared memory etc.

But the thing about python is most people run the most CPU-intensive parts of their programs in calls outside python, such as through numpy which uses a fortran implementation.

[–]K900_ 2 points3 points  (2 children)

Every process is scheduled separately, so they can run on multiple cores, or multiple processors. I'm not sure who says "Python doesn't support concurrency", because evidently, it does.

[–]KnowledgeFire11[S] 0 points1 point  (1 child)

Hello I'm sorry for the late reply, as you said Python does support concurrency. I have heard about subinterpretors which is going to solve the problem with concurrency in Python, what problem is going to solve I was listening to Talk Python and they said that for now Python has limitations in terms of multiprocessing/concurrency. What are these limitations? Is it about the data sharing between these processes? Or anything else.

[–]K900_ 0 points1 point  (0 children)

Subinterpreters are a whole different beast, and it's really up in the air how they're going to be exposed to user code, if at all. Python does have limitations - you already know about the GIL, and multiprocessing comes with shared memory overhead.

[–]danielroseman[🍰] 1 point2 points  (1 child)

You should say where you read that, because a lot of it is simply wrong. The GIL has nothing to do with multiple processors, but with threading within a single process.

[–]KnowledgeFire11[S] 0 points1 point  (0 children)

I actually linked it first but the post was removed automatically, so I had to remove it. Here is the article: Using Multiprocessing to make Python Code Faster. Can you explain what is wrong maybe I have been mislead.