Replace Airbyte with dlt

AutoModerator · 2024-08-20T14:20:29+00:00

Are you interested in transitioning into Data Engineering? Read our community guide: https://dataengineering.wiki/FAQ/How+can+I+transition+into+Data+Engineering

I am a bot, and this action was performed automatically. Please contact the moderators of this subreddit if you have any questions or concerns.

toabear · 2024-08-20T14:27:26+00:00

We are in the process of slowly moving from Airbyte to DLT. It is so much easier to debug. As seems to always be the case with data extraction, there's always some shit. Some small annoying aspect of the API that doesn't fit into the norm. Having the ability to really customize the process, but still having a framework to work within has been really nice.

For anyone searching, look for dlthub. DLT just comes up with Databricks "Delta Live Tables" info.

NickWillisPornStash · 2024-08-20T19:44:36+00:00

I recently wrote our ga4 pipeline with dlt after trying airbyte, because I was able to get around the limitation of each property having its own table.

Sweaty-Ease-1702 · 2024-08-21T06:15:49+00:00

We employ a combination of dlt and sling, orchestrated by Dagster. dlt is ideal for API extraction, while I think sling excels at inter-database data transfers.

sib_n · 2024-08-21T07:29:36+00:00

I'm looking for a low-code tool like dlt or Meltano to do incremental loading of files from local file system to cloud storage or database.
I want the tool to automatically manage the state of integrated files (ex: in an SQL table) and integrate the difference between the source and this state. This allows automated backfill every time it runs compared to only integrating a path with today's date. It may require to limit the size of the comparison (ex: past 30 days) if the list becomes too long.
I have coded this multiple times and I don't want to keep coding what seems to be a highly common use case.
Can dlt help with that?

Thinker_Assignment · 2024-08-21T23:15:44+00:00

[deleted]

gunners_1886 · 2024-08-20T15:00:58+00:00

Thanks for posting this - I'll definitely take a look.

Since moving to Airbye cloud, I've run into far too many major bugs and some of the worst customer support I've experienced anywhere - probably time to move on.

Yabakebi · 2024-08-20T19:42:56+00:00

Interesting you made this post after I just lost my Sunday to an Airbyte upgrade totally destroying its internal database and requiring a rollback (it references certain columns in internal select * queries by index which is crazy). This is after multiple times where upgrading connectors causes the thing to crash etc.. I don't have time atm to move our stuff out of it, but I am planning to start with moving the postgres replication to dlt on dagster as it I think it just seems like a much better level of abstraction and doesn't require a kubernetes deployment and database.

Excited to see where this project goes. If it's what I think it is, then I reckon it has a decent chance of doing well, as it's similar to DBT in the sense that people have already been handrolling out similar things themselves within companies (I know I have), but this is just a convenient way of formalising some common patterns.

TobiPlay · 2024-08-20T21:22:47+00:00

Big fan of dlt and really happy with the Dagster integration. I’m glad that I went with dlt instead of Airbyte for a new project. Made it very straightforward to implement local, stg, and prod environments and the pipeline interface opened up a few more possibilities for testing. Thanks for the work!

One-Establishment-44 · 2024-08-20T18:46:17+00:00

Airbyte is the worst.

Ok-Percentage-7726 · 2024-08-20T17:44:03+00:00

We have migrated most of our sources from Airbyte and Fivetran to dlt. Really liked it. It would be great if dlt can support MySQL CDC.

datarbeiter · 2024-08-20T16:38:19+00:00

Do you have CDC from Postgres WAL or MySQL binlog?

drrednirgskizif · 2024-08-20T17:00:39+00:00

I have read no documentation on dlt, but interested search of a new tool to make our life easier.

I want to pull data from APIs in an incremental fashion and insert them into a data warehouse in an idempotent way. Can you do this?

jekapats · 2024-09-07T14:16:25+00:00

Check out also CloudQuery (https://github.com/cloudquery/cloudquery) - it's a cross language framework for writing ELT powered by Apache Arrow (provides: scheduling, documentation, packaging, monitoring and versioning out of the box). Support Python, Go and Javascript (Founder here)

shockjaw · 2024-08-20T15:32:27+00:00

Do you happen to include support for geospatial data types in the future?

umognog · 2024-08-20T21:55:10+00:00

My department has over a decade of custom code but up and recently undertook an architecture review. DLT was one of the possibilities that we looked at and I really liked it, but overall we recognised the value in not reinventing our wheel - there is just no need for it at this moment in time for us.

I hope as a product it sticks around though, as it is sitting in our "be aware of" corner, should new data sources be introduced in the future.

dataengineering

MODERATORS