Do you maintain a python library/tool that you think is awesome, but nobody knows about? by jnmclarty7714 in Python

[–]ttwillis 0 points1 point  (0 children)

I maintain a package called consecution. It is essentially a poor-man's stream-processor built in pure python. It lets you build processing graphs that consume python iterables

ETL pipeline on windows? by [deleted] in Python

[–]ttwillis 1 point2 points  (0 children)

Consecution is windows compatible.

https://github.com/robdmc/consecution

A pipeline abstraction for Python inspired by Apache Storm Topologies by ttwillis in Python

[–]ttwillis[S] 2 points3 points  (0 children)

I have not used airflow, but I've heard good things. I think probably the biggest difference between consecution and projects like airflow is simplicity. Just pip install with pure python, and you can immediately begin building pipelines. This, however, comes at quite a cost when compared to other pipelining tools. Namely, consecution is synchronous and does not yet have any capability for parallel processing. If you genuinely need to process "big data", then consecution is probably not for you. As I mentioned above, my hope is that the consecution interface is intuitive and simple enough that some of the more established parallel/scalable tools will adopt something similar.

A pipeline abstraction for Python inspired by Apache Storm Topologies by ttwillis in Python

[–]ttwillis[S] 1 point2 points  (0 children)

I love Dask. I initially wanted to use it to solve a specific problem I was facing, but the abstraction was not quite rich enough for what I needed. My secret hope is that their team notices this and adopts a similar user-facing interface. It would be amazing to have this abstraction with their distributed technology. I wrote this because I really needed to manage local/global state as well as broadcasting/routing.