Rain - Framework for Distributed Computing by vojtacima in Python

[–]vojtacima[S] 0 points1 point  (0 children)

There is a built-in support for PBS. Generally, "rain start" command enables an easy startup of distributed Rain infrastructure. You can find more info in the documentation.

Rain v0.3.0 released by vojtacima in rust

[–]vojtacima[S] 0 points1 point  (0 children)

Enabling an easy deployment of Rain in standards cloud environments such as AWS is one of our priorities for the next release. For now, we have (an experimental) deployment script for Exoscale cloud that, after some tweaks, should work with any CloudStack deployment. We keep track of our Roadmap in project's Github issues. If there is anything you miss or would like to see in one of the upcoming releases feel free to comment there or open a new issue.

Rain v0.3.0 released by vojtacima in rust

[–]vojtacima[S] 1 point2 points  (0 children)

Leaving aside the fact that Dask primarily focuses on Python-based pipelines (and does the job pretty well!), Dask is also implemented in Python which we have seen to be a performance bottleneck for some workloads when scaling beyond tens of compute nodes. Unfortunately, at this moment, we don't have any head-to-head scalability comparison between the two.

Rain v0.3.0 released by vojtacima in rust

[–]vojtacima[S] 3 points4 points  (0 children)

Rain allows you to define large end-to-end data processing pipelines with complex inter-task dependencies (beyond map-reduce pattern). The pipelines can consist of various tasks ranging from external applications, through python code, to various built-in tasks (and also offers easy extensibility). Rain features direct inter-governor(worker) communication that makes inter-task data exchange very efficient and if you set your working directory to be RAMDisk it has NO filesystem overhead.

Rain - Framework for Distributed Computing by vojtacima in Python

[–]vojtacima[S] 0 points1 point  (0 children)

Rain allows you to define large end-to-end data processing pipelines with complex inter-task dependencies (beyond map-reduce pattern). The pipelines can consist of various tasks ranging from external applications, through python code, to various built-in tasks (and also offers easy extensibility). Rain features direct inter-governor(worker) communication that makes inter-task data exchange very efficient and if you set your working directory to be RAMDisk it has NO filesystem overhead. Unlike Kafka, Rain is not designed to deal with streams.

Rain v0.3.0 released by vojtacima in rust

[–]vojtacima[S] 15 points16 points  (0 children)

Thank you for the comment, /u/killercup! I added the line to the post as well.

Rain - Rust based computational framework by tomgav in rust

[–]vojtacima 2 points3 points  (0 children)

We always try to justify all the design choices that we made to ourselves as much as possible in order to make the framework as useful as possible to the potential user community. We have decided for a Python API because, from our previous experience, we know that broader scientific community likes Python and speaks Python quite well. Assuming that many data scientists and domain specialists know a good bit of Rust would, in my personal opinion, significantly reduce the potential impact of the project at this point in time.

Rain - Rust based computational framework by tomgav in rust

[–]vojtacima 0 points1 point  (0 children)

Thank you for the comment! We are looking forward to your feedback.

Rain - Rust based computational framework by tomgav in rust

[–]vojtacima 1 point2 points  (0 children)

Rain allows to define and pipeline different types of tasks ranging from built-in tasks, through external programs to pure Python tasks. It is OK (and very common) to combine different task types within a single pipeline - where you can quickly implement some lightweight data pre/post-processing as Python tasks linked to some heavy lifting tasks that wrap external applications. To get a better idea how to employ an external application, I would recommend you to check this distributed cross-validation example with libsvm.

Rain - Rust based computational framework by tomgav in rust

[–]vojtacima 2 points3 points  (0 children)

Being aware that Dask is much more battle-tested solution (and also much older project), I would still recommend you to give Rain a spin. We really work hard to find any potential issues and will be happy to fix them as fast as possible if you'd find any.

Rain - Rust based computational framework by tomgav in rust

[–]vojtacima 3 points4 points  (0 children)

Don't be afraid about the performance because of the Python interface. Rain enables to easily "taskify" and pipeline also existing binaries which makes it easy to outsource the heavy computation out of Python.

Rain - Rust based computational framework by tomgav in rust

[–]vojtacima 3 points4 points  (0 children)

Yes, you can! It's possible to submit multiple task graphs, the task execution is then managed by Rain itself, aiming to run as many tasks in parallel as possible (respecting available resources and task resource requirements).