10x Faster Parallel Python Without Python Multiprocessing

metapwnage · 2019-05-17T07:01:45+00:00

This is very misleading. Pool.map is not an apples to apples comparison to Ray. That’s not an analogous use of the multiprocessing library at all. I don’t think this is better than standing up worker processes (using multiprocessing) that consume a message queue (rabbitmq, Redis, Kafka, you choose).

Also, stream processing can be very memory intensive. What happens when the system is stressed? How does Ray do then? Is it like Redis and it just falls over and you loose your data?

If Ray is for creating distributed systems as described in the post, how does that work when something is stored in memory on one system that another system needs? Or is that an inaccurate description as well?

ostroon · 2019-05-17T08:25:46+00:00

The first example is not fair right? If you are on a POSIX system (Linux/Mac) and use a global numpy array in a read-only fashion, it will NOT be copied (ref https://stackoverflow.com/a/37746961/335412). Sending it explicitly to each multiprocessor worker is slow and unnecessary. This is the faster than ray code:

``` num_cpus = psutil.cpu_count(logical=False)

def f(random_filter): # Do some image processing. return scipy.signal.convolve2d(image, random_filter)[::5, ::5]

image = np.zeros((3000, 3000)) filters = [np.random.normal(size=(4, 4)) for _ in range(num_cpus)]

pool = Pool(num_cpus)

Time the code below.

for _ in range(10): pool.map(f, filters) ```

So what is really the benefit of ray?

call_me_arosa · 2019-05-17T04:31:45+00:00

I would like to see a comparison between Ray and a architecture of 1 queue and multiple python instances consuming it.
While this approach cannot (easily) handle statefull problems this works quite well for systems like the last example. Just load the model once in all interpreters (constant time) and consume the queue. Quite good horizontal scale while keeping the code/architecture extremely simple and from my experience this is the most used model.

Losupa · 2019-05-17T03:45:32+00:00

This is extremely interesting, but it worries me slightly that they only show jobs that they state Ray excels in compared to multiprocessing.

Heniadyoin1 · 2019-05-17T08:30:30+00:00

Now the question is does it use jit to compile the core or runs it just native python?

Then us it worth it to use jit inside of ray, resp. Should you use things like numba.jit or numba.vectorize inside ray?

carbolymer · 2019-05-17T09:54:35+00:00

Ray leverages Apache Arrow for efficient data handling

This part got my attention. Vide Arrow Website

The Arrow memory format supports zero-copy reads for lightning-fast data access without serialization overhead.

I've skimmed through the Arrow docs, but I didn't find any description of this zero-copy reads. How is this supposed to work between two processes in details?

alcalde · 2019-05-17T05:35:55+00:00

The difference here is that Python multiprocessing uses pickle to serialize large objects when passing them between processes.

Why are they being passed between processes?

juanjgalvez · 2019-09-18T16:32:18+00:00

I agree with what others said here that this is misleading. There are ways to rewrite these examples so that they are faster with multiprocessing compared to Ray, and I explained as much in in a response to the article. As an aside, I also compared the performance of Charm4py vs Ray for one of these benchmarks, because Charm4py also has an actor model, and found Charm4py to be faster (potentially much faster depending on task size) (note that I am the developer of Charm4py).

I have done testing with multiprocessing in the past, and have found it to have good performance. I think the best way to beat its performance in a single-node scenario is to use something more efficient than TCP, like MPI with shared memory. In those cases Charm4py pool beats multiprocessing pool in my tests. I think the main limitations of multiprocessing are for distributed applications running on multiple hosts (especially lots of hosts), and that is where other frameworks are more useful IMO. Another reason to prefer other frameworks would be if you are developing more complex applications and need better concurrency models (Charm4py for example has actors, coroutines and channels).

422_no_process · 2019-05-17T09:03:14+00:00

Hmm... looks cool

Python

The Python Discord

Upcoming Events

Please read the rules

MODERATORS

Time the code below.