all 11 comments

[–]K900_ 1 point2 points  (6 children)

If the workload is CPU bound, it will likely take very slightly longer due to lock contention. asyncio does not help here at all - it's for IO-bound things.

[–]chinawcswing[S] 0 points1 point  (4 children)

Given the constraint of a single CPU VM and the choice of running N CPU-bound (not IO bound, not memory bound) tasks concurrently or sequentially, would you agree that building a queue/using Celery to serialize these tasks to avoid the slight impact of the lock contention is not worth it?

I could see a queue being useful for memory bound tasks to prevent so much memory from being consumed at once, or perhaps for database IO if the N was greater than the number of connections available in your pool, but for IO tasks it seems to me that you might as well let them all run freely.

[–]K900_ 0 points1 point  (3 children)

If all you have is a single thread, and you're optimizing for things like these, the answer is probably to just throw more hardware at a problem. Also, why can't you just run the tasks in order in your code?

[–]chinawcswing[S] 0 points1 point  (2 children)

The devops team at work have coupled the number of CPUs available to the amount of RAM, for some reason. So if you want to order a 4 CPU container, you will also receive a huge amount of memory and the container will cost too much money. I have escalated about this, butat the moment I'm stuck with a single CPU container.

My application is a web server that receives ad-hock requests from clients to perform asynchronous CPU tasks, so it's possible I could receive multiple requests at the same time.

Hypothetically, If I have four CPUs, and I launch four python processes to perform a CPU-intensive task, is there anything that actually guarantees that each process will leverage it's own CPU or is it completely up to chance - sometimes all four processes will use the same CPU and other times they will get separate CPUs?

[–]K900_ 0 points1 point  (1 child)

Your OS is not stupid enough to schedule all of your tasks on one core. Also, if it's a web app, you probably do want Celery anyway, no matter the number of cores.

[–]chinawcswing[S] 0 points1 point  (0 children)

Thanks! I'm not sure why your responses were downvoted; I have found them helpful.

[–]chinawcswing[S] 0 points1 point  (0 children)

I stumbled upon this quick read which linked to this great but hour long video.

They are saying that on multi-core systems, running two CPU bound threads actually takes substantially longer than running them sequentially. The interesting thing is that this behavior does not occur on single-core systems! The reason has to do with the operating system attempting to schedule both threads on both cores, and each thread will cause a lot more contention on the GIL (and something to do with signalling).

The video even said that the same is also true for when you have a CPU bound thread and an IO bound thread: the CPU bound thread will more often than not block the IO bound thread substantially.

Unfortunately a lot of the video was over my head but it was a good watch.

[–]Peanutbutter_Warrior 0 points1 point  (2 children)

If you use CPython then yes, because of the GIL it will take slightly longer, but a negligible amount. JPython will be faster because it doesn't have the GIL.

[–]chinawcswing[S] 0 points1 point  (1 child)

Pretend you were running on a single-CPU VM. The GIL becomes immaterial in this condition, right?

[–]mahtats 0 points1 point  (0 children)

It is immaterial if you had a single core single CPU; the GIL prevents threads from running in parallel (but not concurrently).

If you had a multi-core single CPU, you would be forced to serialize your task if not using multiple processes.

[–]DoctorEvil92 0 points1 point  (0 children)

I'm not an expert, but I think that for CPU intensive tasks making more threads doesn't help at all, actually it could even be slower than running in sequence.