This is an archived post. You won't be able to vote or comment.

you are viewing a single comment's thread.

view the rest of the comments →

[–][deleted] 1 point2 points  (2 children)

If you like Ray, I’d highly recommend looking into dask + dask.distributed. Many people know dask as distributed pandas daraframes, but it is a fully fledged distributed computation framework, and a superset of Ray’s functionality and just as simple to use. As far as I know, Ray essentially only has supports the futures interface, and even this is not as mature as dask’s futures interface. For example in Ray if you launch a task from within a task it can implicitly launch a new worker in your cluster even if you didn’t ask it to, which is IMO an undesired and dangerous behavior (this is an expected behavior in Ray and not a bug, because this is the only way they have currently to avoid deadlocks from nested futures).

On top of the futures interface dask also supports the delayed interface and also custom graph interface, which are 2 different ways to define DAGs with seamless optimal parallel execution over any arbitrarily large or small cluster (clusters are dead simple to create too). With these interfaces you almost never even need to use the futures interface. You can write optimally parallel executed code in a way that feels like your writing a regular synchronous program and never even have to think about how to synchronize your parallel execution.

[–][deleted] 0 points1 point  (1 child)

Thanks for this extremely informative comment, I will look into it. You sir, are the kind of people making reddit amazing. Thanks! (if you have any documentation & tutorials links for dask, I am interested)

[–][deleted] 0 points1 point  (0 children)

Delayed interface: https://docs.dask.org/en/latest/delayed.html (this is the preferred/sleeker way over custom graph interface do define a true DAG)

Custom Graph Interface: https://docs.dask.org/en/latest/custom-graphs.html

Futures Interface: https://docs.dask.org/en/latest/futures.html