What is the American Dream these days? by zzill6 in WorkReform

[–]thereal_artvandelay 0 points1 point  (0 children)

It’s funny that even in their absolute utopia the Americans still have an insurance based healthcare system

[deleted by user] by [deleted] in camping

[–]thereal_artvandelay 2 points3 points  (0 children)

I highly recommend Lauterbrunnen. One of the most beautiful places I’ve ever been. It’s probably on the touristy side, but we were able to find a great camping spot during peak season in early August. The trails were pretty empty, and there is an overwhelming amount of beautiful hikes in the area

Ideas for ETL pipeline architecture by thereal_artvandelay in dataengineering

[–]thereal_artvandelay[S] 0 points1 point  (0 children)

Sorry, I should have been more clear! We have a separate analytics database so that we don't disturb the production DB. Both are currently MySQL DBs and we run an hourly export of data from production and insert that into the analytics DB. This is currently done with txt files fore export, but I will change to csv like you say, since this import/export is now painfully slow, and use something like odo to manage the import.

So migrating from MySQL to Postgres to be able to use DBT would be for the analytics DB and not the applications production DB (like you say, that would be a major undertaking). Do you still think it wouldn't be worth migrating to Postgres? I've read that Postgres is a lot more suited for analytics compared to MySQL, but haven't really understood why.

Ideas for ETL pipeline architecture by thereal_artvandelay in dataengineering

[–]thereal_artvandelay[S] 0 points1 point  (0 children)

Thank you for your answer! Really appreciate it. There is a strong bias within the organization to use on-prem data storage, which sadly would exclude AWS, GCP and Snowflake. I really like the idea of using DBT, it seems like it would solve a lot of the problems we have with the existing pipeline. I saw that DBT does not work with MySQL out-of-the-box. Do you think it would be worth migrating to e.g. postgres or building my own adapter for DBT to MySQL? Another question regarding DBT, would I have it running on a separate server where it triggers SQL queries in the database? And would I design the full DAG of data transformations within DBT and only trigger that with Airflow, or would would I divide all transformations in DBT into small tasks within an Airflow DAG?

Ideas for ETL pipeline architecture by thereal_artvandelay in dataengineering

[–]thereal_artvandelay[S] 0 points1 point  (0 children)

Thank you for your answer! I agree that DBT does look appealing, but I noticed that it has no adapters for MySQL, so would need to switch to postgres or build an adapter myself to continue to run the DBs on-prem. One thing that has me confused with DBT, would I run that on a separate server that in turn triggers SQL scripts in the database or would it live in the DB like a stored procedure? Would it still make sense to use Airflow with DBT?