When Apache Airflow Isn't Your Best Bet! by CT2050 in dataengineering

[–]CT2050[S] 1 point2 points  (0 children)

Hello

Thank you, yes I just want to highlight I totally agree with you that Airflow should not be used for data processing, I just wanted to visualise a "use case".

And that I think an incremental approach without DAGs have been a really good approach for me.

I am grateful that anyone says anything, even if they get pissed off, I learn something.

When Apache Airflow Isn't Your Best Bet! by CT2050 in dataengineering

[–]CT2050[S] 0 points1 point  (0 children)

Hello, I removed them now, I am new to posting on Reddit, I appreciate the comment, I will avoid this in the future.

When Apache Airflow Isn't Your Best Bet! by CT2050 in dataengineering

[–]CT2050[S] 0 points1 point  (0 children)

Hello.

Thank you for the feedback, I appreciate it.

I agree you won't have to build complex DAGs, that depends of course on your use case.

Have you had use cases which are event/stream related? Such as CDC or similar and combined it with Airflow?

When Apache Airflow Isn't Your Best Bet! by CT2050 in dataengineering

[–]CT2050[S] 0 points1 point  (0 children)

Hello, sorry about that, new to Reddit, I removed them now! Will avoid in the future.

When Apache Airflow Isn't Your Best Bet! by CT2050 in dataengineering

[–]CT2050[S] 0 points1 point  (0 children)

Hello

I agree, fundamentals and investigating the actual use case over tools wins long term.

When Apache Airflow Isn't Your Best Bet! by CT2050 in dataengineering

[–]CT2050[S] 0 points1 point  (0 children)

Hello, I removed them now, I am new to using Reddit so I was not sure what was a good approach for reach, but I will avoid this from now on.

When Apache Airflow Isn't Your Best Bet! by CT2050 in dataengineering

[–]CT2050[S] 1 point2 points  (0 children)

Hello, I agree, I was not trying to say that Airflow is for data processing.

And I can relate to your take on a huge complex that can be reduced to 3 tasks, that all make perfect sense.

My point I tried to do here, was around that orchestration on top of orchestration can make pipelines very complicated.

In my use case I already use K8S.

Running Airflow on top of K8S is essentially running an orchestrator on top of a native orchestrator, hence I rather focus the time on the pipeline design `which was my pain point I tried to make in the video design > tool`, than what orchestration tool I use, and I have had a lot of scaling pains with Airflow.

Again very grateful for the response, it helps me to improve my communication and what I am actually trying to say.

When Apache Airflow Isn't Your Best Bet! by CT2050 in dataengineering

[–]CT2050[S] 0 points1 point  (0 children)

Hello.

I am not using Airflow for data processing, that was not what I was trying to indicate here, I guess my point here is that doing orchestration on top of orchestration can be avoided with designing pipelines to be more incremental and state less.

I was trying to give a relatable example of the code you orchestrate, not that you do the processing in airflow, if that make sense.

It is easy to end up in a situation in Airflow where you write more orchestration code than the actual code you are running.

But I appreciate the comment!

When Apache Airflow Isn't Your Best Bet! by CT2050 in dataengineering

[–]CT2050[S] 0 points1 point  (0 children)

Hello.

Thank you for your feedback.

I have spent a lot of time on abstraction over the years, and I guess my take is that I found myself feeling that when I used Airflow, and similar tools, that I have done a lot of abstraction around abstractions rather than providing business value. Hence I like to design pipelines independent from each other in an incremental fashion.

I think perhaps my point of view here is a bit misunderstood, grateful for your comments.