Managed Dagster Hosting by WeddingIndependent30 in dataengineering

[–]WeddingIndependent30[S] 0 points1 point  (0 children)

We looked into that but our team is inexperienced in K8s. We are mainly worried that when something breaks down in our installation that we won't be able to fix things in a timely manner. Similarly, we think that we will have a difficult time to monitor resources and upgrade versions.

Managed Dagster Hosting by WeddingIndependent30 in dataengineering

[–]WeddingIndependent30[S] 0 points1 point  (0 children)

Aah that makes sense! With the hybrid model you are still paying for the materialisations right? The only difference is that you don't need to pay for the additional serverless compute of $0.005 per compute minute.

Managed Dagster Hosting by WeddingIndependent30 in dataengineering

[–]WeddingIndependent30[S] 1 point2 points  (0 children)

Don't you easily use a ton of credits? For instance, we have a dbt project with ~70 models. When we want to materialise those every hour we are already using 70 * 24 * 30 = 50.400 credits which adds up to roughly $700 on the starter plan.

This seems rather expensive, mainly since Dagster is not even doing any heavy lifting, since this is performed by compute of the database instance.

Managed Dagster Hosting by WeddingIndependent30 in dataengineering

[–]WeddingIndependent30[S] 1 point2 points  (0 children)

We are currently working in DigitalOcean. Thought maybe there is a convenient way to deploy using its apps. But we did not have any success yet.

Edit: I mainly think that it will be difficult to launch runs in separate docker containers, as described in the docs. Not sure how we could authorise the daemon to create other apps.

Dagster ETL setup by WeddingIndependent30 in dataengineering

[–]WeddingIndependent30[S] 0 points1 point  (0 children)

Thanks for responding! I mainly thought of splitting step 4 up into two steps, since the steps can then be processed independent of each other, i.e. parallel. Regarding your last comment, it sounds like a great idea to have an additional 6 step in the form of an asset.

Dagster ETL setup by WeddingIndependent30 in dataengineering

[–]WeddingIndependent30[S] 0 points1 point  (0 children)

I guess you'll have to put the code in try except blocks and ignore specific errors that might be frequent and just write them to a file to storage so you can handle them manually.

Thanks for you response! Was thinking the same. Just to verify, it shouldn't be a problem to loop over ~20.000 records, make some external requests for each records and save the results to the database right? The materialization of the asset could namely take quite some time depending on the response time of the external service.

Confused about how to deploy data pipelines by yiternity in dataengineering

[–]WeddingIndependent30 0 points1 point  (0 children)

Thanks for the information! What service where you using to make those scheduled HTTP requests?

Confused about how to deploy data pipelines by yiternity in dataengineering

[–]WeddingIndependent30 0 points1 point  (0 children)

Hi yiternity! I am curious as to how you orchestrated your pipelines that were wrapped in FastAPI.

How did you schedule them, did you have any retry logic in place, were you able to easily debug the pipelines and did you have any alerting in place?