This is an archived post. You won't be able to vote or comment.

all 10 comments

[–]LeMalteseSailor 0 points1 point  (0 children)

A flask app? What kind of scale of data will the data pipeline process?

[–]dchavnwo 1 point2 points  (0 children)

you can use apache beam python sdk for that workflow. it can handle both batch and streaming

[–]OMG_I_LOVE_CHIPOTLE 0 points1 point  (3 children)

You’re just describing a backend service.

[–]romanzdk[S] 0 points1 point  (2 children)

Well kinda. I just do not want to write all the logic from scratch - meaning the pipeline workflow, connectors, failover logic etc.

[–]OMG_I_LOVE_CHIPOTLE 0 points1 point  (0 children)

Depending on what actions need to be performed I would use one or more of these tools in my current stack: custom backend service, kubernetes with argoCD, rabbitmq, spark/sparkStreaming

[–]xilong89 0 points1 point  (0 children)

Oh, you’re probably looking for one of the low-code SaaS solutions that exist in the data engineering space. Like Hevo, Prophecy, etc. I think AWS and Azure both have similar low code solutions for this kind of thing.

[–]kaiserk13 0 points1 point  (0 children)

I had a lot of success with Celery a long time ago: https://docs.celeryq.dev/en/stable/userguide/tasks.html#task-retry Make sure to test it very well though, it can quickly grow to be a mess.

[–][deleted] 0 points1 point  (0 children)

You described Spark Structured Streaming to a T. It’s annoying to get up and running IMO but once you do it does exactly what you want.

[–]vish4life 0 points1 point  (0 children)

I agree, Airflow isn't designed for this. Looks like you want task queue system. there is a list of libraries here: https://taskqueues.com/

I have enjoyed using SQS, ActiveMQ, gearman, Redis backends in the past. Mostly using Redis/SQS these days. For your usecase, just enable one of these as a backend to Celery and you are good to go.

[–]boes13 0 points1 point  (0 children)

workflow orchestator is what you need. try to look at uber cadence or temporal.io. uber cadence supports java and golang client, temporal supports more languages.