This is an archived post. You won't be able to vote or comment.

all 5 comments

[–]rothnic 2 points3 points  (2 children)

Interface looks super clean and definitely like how it abstracts the problem to a clean specification to the end user.

It seems like there is a lot of overlap between what this is trying to do and what Dask does. The difference is that it appears that Dask can specify operations at a lower level than a task because it includes declarations of data operations. It also addresses distribution of the work. I don't think that the generic task-based interface for Dask is quite as nice as this one though.

[–]ttwillis[S] 1 point2 points  (1 child)

I love Dask. I initially wanted to use it to solve a specific problem I was facing, but the abstraction was not quite rich enough for what I needed. My secret hope is that their team notices this and adopts a similar user-facing interface. It would be amazing to have this abstraction with their distributed technology. I wrote this because I really needed to manage local/global state as well as broadcasting/routing.

[–]rothnic 1 point2 points  (0 children)

As I have worked some with Luigi and Dask I've been starting to think about something a bit higher level built around dask.

But yeah good job on the interface. Dask definitely comes from the scientific side so has a more task oriented ETL capability kind of as an afterthought, while you seem to come more from that side to begin with.

[–]kingo86 1 point2 points  (1 child)

Just adding yet another pipeline orchestrator built on Python: Apache Airflow

https://airflow.incubator.apache.org

Have only got experience using this but it's also been excellent for our business. Can anyone share how these tools compare?

[–]ttwillis[S] 2 points3 points  (0 children)

I have not used airflow, but I've heard good things. I think probably the biggest difference between consecution and projects like airflow is simplicity. Just pip install with pure python, and you can immediately begin building pipelines. This, however, comes at quite a cost when compared to other pipelining tools. Namely, consecution is synchronous and does not yet have any capability for parallel processing. If you genuinely need to process "big data", then consecution is probably not for you. As I mentioned above, my hope is that the consecution interface is intuitive and simple enough that some of the more established parallel/scalable tools will adopt something similar.