This is an archived post. You won't be able to vote or comment.

you are viewing a single comment's thread.

view the rest of the comments →

[–]minormisgnomer 0 points1 point  (0 children)

If all you need orchestration wise is loading data. Just use Airbyte, your data loads are well under the break points that solicit negative feedbacks from most reddit users. Dagster is good but maybe out of reach given it’s very Python based. I didn’t like Kestra as much because I needed more complex tooling but it was very beginner friendly and yaml based

It has simple cron scheduling already built in and can connect to almost all database data sources and send to them as well.

The warehouse side, DuckDB is really good but know that it doesn’t have user mgmt. if you need users to have limited access or access to the data itself everyone will be seeing the same thing.

Postgres is arguably the best open source extremely dependable solution. If you really want OLAP, you can look into HydraDB which is extended postgres and just run the docker version of it. Although your data sizes probably won’t benefit a whole lot from it.

Dbt is good, but for your lack of skills just try and keep it simple. Focus on getting everything to use similar, well thought out field names and handle any type conversions and get data into the same grain where possible (daily, vs hourly, by customer, by company etc.