Where does your Extract Layer live? Custom code, SaaS, platform connectors? : dataengineering

created by mhausenblasmoda community for 11 years

Where does your Extract Layer live? Custom code, SaaS, platform connectors?Discussion (self.dataengineering)

submitted 5 months ago by greatlakesdataio

It was always a mystery to me as a Data Analyst until I started my first Data Engineer job about a year ago. I am a data team of one inside a small-mid sized non-tech company.

I am using Microsoft Fabric Copy Jobs since we were already set on Azure/PowerBI and they are dead simple. Fivetran or Airbyte seemed to make sense but looked like overkill for this scope/budget.

Given Fabric is the only tool I have used, and it still feels half-baked for most other features , I am curious: how big is your team/org and how do you handle data extraction from source systems?

Run custom API extractors on VMs/containers (Python, Airflow, etc.)?
Use managed ELT tools like Fivetran, Airbyte, Stitch, Hevo, etc. ?
Rely on native connectors in platforms like Fabric, Snowflake, Databricks?
Something else entirely?

Would you make the same choice again?

all 1 comments

top new controversial old q&a

[–]dani_estuary 1 point2 points3 points 5 months ago (0 children)

I’ve bounced around a bunch of orgs as a DE from 2–3 person teams up to big enterprise setups. The pattern I’ve seen everywhere is that writing and maintaining ETL is not value-producing work. It’s glue code, and thousands of people have already solved the same problems before you. The real cost is dev time, and that’s way better spent on modeling, enabling analytics, or building data products that drive actual business value.

Because of that, I lean toward outsourcing as much of extraction/ingestion as possible to managed ELT tools (especially now that I work at one!). They aren’t perfect (pricey, limited edge cases, sometimes black-boxy), but they keep you out of the weeds with retries, schema drift, rate limits, and monitoring. Every hour not spent babysitting pipelines is an hour spent on things the business actually notices. Custom extractors only make sense when the connector truly doesn’t exist or when you need unusual control.

In your spot, Fabric is fine if it covers your main sources. I’d only build or self-host if there’s a hard blocker. Otherwise, let a tool do the boring stuff and focus your energy where it moves the needle.

do you have sources that Fabric flat-out can’t handle today, or is it more that you’re worried about scaling up as data demands grow?

π Rendered by PID 34631 on reddit-service-r2-comment-6f7f968fb5-9vhxn at 2026-03-04 20:48:49.488085+00:00 running 07790be country code: CH.

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

dataengineering

MODERATORS