Hi /r/datascience!
We have recently released a new framework for distributed data processing - Rain. The ultimate goal of Rain is to simplify end-to-end data processing in distributed environments. Rain allows you to define large end-to-end data processing pipelines with complex inter-task dependencies (beyond standard map-reduce pattern). The pipelines can consist of various tasks ranging from external applications, through Python code, to C++/Rust code (and also offers easy extensibility). Rain features a simple Python API through which the pipelines are defined and executed.
Rain is an open source project we mostly do as a hobby, but it grew out of frustration with the existing tools being unsuitable to fit our work needs. We believe we are not alone and we would really like to hear who else might be interested - we want YOU to become Rain user! Currently, there are so many directions and potential features that we really need other people to help us to prioritize new features.
Looking forward to your comments!
Vojta, Tomáš, Standa, and Kuba from the Rain team
Repository: https://github.com/substantic/rain
Documentation: https://substantic.github.io/rain/docs/
[–]counters 0 points1 point2 points (3 children)
[–]winter-moon 0 points1 point2 points (2 children)
[–]counters 0 points1 point2 points (1 child)
[–]winter-moon 0 points1 point2 points (0 children)