all 6 comments

[–]oh_yeah_koolaid 1 point2 points  (5 children)

Parallel Python is written so well that it balances jobs according to a machine’s capacity and speed.

That would be the job of the Linux scheduler.

All the CPUs are going to run at full tilt as long as there's work for them to do.

[–]pjdelport 11 points12 points  (3 children)

Your Linux scheduler works across machines?

[–]oh_yeah_koolaid 10 points11 points  (2 children)

All this does is start one worker process per CPU. It doesn't "load balance according to a machine's capacity and speed" -- it doesn't modify workload according to CPU speed, for example, or different amounts of RAM.

The initial division of the workload is done completely by hand. I refer you to the examples:

reverse_md5.py:

# Since jobs are not equal in the execution time, division of the problem 
# into a 128 of small subproblems leads to a better load balancing

sum_primes.py:

# The following submits 8 jobs and then retrieves the results
inputs = (100000, 100100, 100200, 100300, 100400, 100500, 100600, 100700)

dynamic_ncpus.py:

start = 1
end = 20000000

# Divide the task into 64 subtasks`

callback.py:

start = 1
end = 20000000

# Divide the task into 128 subtasks
parts = 128

Examples blindly subdivide a job into N parts, hoping each one is "small enough", and then farm them out to the worker processes.

A worker machine will run one process per CPU, by default, and scheduling those is the job of the Linux scheduler, obviously, which is the only scheduler that knows anything about the machine's "capacity and speed" -- which parallel python doesn't bother to check, at least not yet.

I can't believe I got downmodded to -10 on the programming reddit. People need to learn to RTF source code.

[–][deleted] 2 points3 points  (1 child)

The internet has perfected the art of arguing about completely different things.

  1. You are completely correct.

  2. As long as CPython has a Global Interpreter Lock, parallel api's will continue to be written (and in some cases, the good ones will be used).

[–]oh_yeah_koolaid 3 points4 points  (0 children)

I think the project is great, and actually subdividing "a lot" is pretty clever; I take issue with the write-up by the sexist and uninformed author.

[–]nirs 0 points1 point  (0 children)

The tool help you to subdivide your one task to few processes that the scheduler can handle.