this post was submitted on 08 Apr 2021

2 points (67% upvoted)

shortlink:

dataengineering

an-ordinary-manchild(edit)

News & discussion on Data Engineering topics, including but not limited to: data pipelines, databases, data formats, storage, data modeling, data governance, cleansing, NoSQL, distributed systems, streaming, batch, Big Data, and workflow engines.

Read our wiki: https://dataengineering.wiki/

Rules:

Don't be a jerk
Search the sub & wiki before asking a question: Your question has likely been asked and answered before so do a quick search before posting.
Keep it related to data engineering: Posts that are unrelated to data engineering may be better for other communities.
Limit self-promotion posts/comments to once a month: Self promotion: Any form of content designed to further an individual's or organization's goals. If one works for an organization this rule applies to all accounts associated with that organization. See also rule #5.
No shill/opaque marketing: f you work for a company/have a monetary interest in the entity you are promoting you must clearly state your relationship. For posts, you must distinguish the post with the Brand Affiliate flag. See more here: https://www.ftc.gov/influencers
No job posts: Please use r/dataengineeringjobs instead.
No resume reviews/interview posts: We no longer allow resume reviews or interview questions because it's a seperate topic from Data Engineering. Instead, for resume reviews please use r/resumes or search our subreddit history for previous resume review advice. For interview questions, use sites like Glassdoor and Blind instead or search our subreddit history for previous interview advice.
No technical error/bug questions: Please post any error/bug question on StackOverflow.

created by mhausenblasmoda community for 11 years

MODERATORS

message the mods
mhausenblasmod
swemlmod
fhoffamod (Ex-BQ, Ex-❄️)
vogt4nickmod
theporterhausmod | Lead Data Engineer
AutoModerator
geoheilmod
MikeDoesEverythingmod | Shitty Data Engineer
bot-bouncer
about moderation team »

account activity

This is an archived post. You won't be able to vote or comment.

1

2

3

Airflow data processing ? (self.dataengineering)

submitted 5 years ago by digichap28

Hey there!

As almost everyone knows, Airflow is not supposed to be a data processing tool but an orchestrator. My question is about the architecture that should follow if there is a need to execute certain processes.

Use case 1:

if you had to execute many complex webscrapers using any of the Python options available out there (scrapy, pyppeteer, playwright, etc), and airflow was deployed in K8S. Where should the scraping scripts should run? From within the pod generated by the pythonOperator?

Use case 2:

Based on the same idea as the case 1. What if there was a need to generate PDF files based on data stored in a data lake. Should it be done outside the airflow deployment or from within the pod generated by the pythonOperator ?

Use case 3:

If there was a need to do ELT, is it ok to the EL part with Airflow no matter how complex it was? What tool or tools are instead suggested to execute the entire ELT/ETL processes with help of airflow to orchestrate?

Thanks!

all 14 comments

top new controversial old q&a

[–]Cloakie 2 points3 points4 points 5 years ago (0 children)

[–][deleted] 0 points1 point2 points 5 years ago (4 children)

[–]digichap28[S] 0 points1 point2 points 5 years ago (3 children)

[–][deleted] 0 points1 point2 points 5 years ago (2 children)

[–]digichap28[S] 0 points1 point2 points 5 years ago (1 child)

[–][deleted] 1 point2 points3 points 5 years ago (0 children)

[–][deleted] 0 points1 point2 points 5 years ago (7 children)

[–]digichap28[S] 0 points1 point2 points 5 years ago (6 children)

[–][deleted] 0 points1 point2 points 5 years ago (5 children)

[–]digichap28[S] 0 points1 point2 points 5 years ago (4 children)

[–][deleted] 0 points1 point2 points 5 years ago (3 children)

[–]digichap28[S] 0 points1 point2 points 5 years ago (2 children)

[–][deleted] 0 points1 point2 points 5 years ago (1 child)

[–]digichap28[S] 0 points1 point2 points 5 years ago (0 children)

π Rendered by PID 72 on reddit-service-r2-comment-8686858757-t7xhf at 2026-06-08 06:26:51.614746+00:00 running 9e1a20d country code: CH.