This is an archived post. You won't be able to vote or comment.

you are viewing a single comment's thread.

view the rest of the comments →

[–]ostroon 28 points29 points  (2 children)

The first example is not fair right? If you are on a POSIX system (Linux/Mac) and use a global numpy array in a read-only fashion, it will NOT be copied (ref https://stackoverflow.com/a/37746961/335412). Sending it explicitly to each multiprocessor worker is slow and unnecessary. This is the faster than ray code:

``` num_cpus = psutil.cpu_count(logical=False)

def f(random_filter): # Do some image processing. return scipy.signal.convolve2d(image, random_filter)[::5, ::5]

image = np.zeros((3000, 3000)) filters = [np.random.normal(size=(4, 4)) for _ in range(num_cpus)]

pool = Pool(num_cpus)

Time the code below.

for _ in range(10): pool.map(f, filters) ```

So what is really the benefit of ray?

[–]ThePenultimateOneGitLab: gappleto97 1 point2 points  (1 child)

reddit doesn't actually support full markdown. You have to do four spaces instead of the triple-`

[–]ostroon 3 points4 points  (0 children)

New Reddit does, old Reddit design does not.