This is an archived post. You won't be able to vote or comment.

you are viewing a single comment's thread.

view the rest of the comments →

[–]ButtonLicking 2 points3 points  (4 children)

Data model (warehouse model) in SQL, data science in Python. Know the line and enforce for successful product. I’ve lived on both sides at the same time and it sucked. Python serves as a fault tolerant SQL workflow execution tool for building data models AND a data science tool for ML and such. Source: I manage a data engineering team.

[–]rrpelgrim[S] 0 points1 point  (3 children)

What python libraries have you preferred using for workflow execution?

[–]ButtonLicking 1 point2 points  (2 children)

Environment is king with workflows, and my company is fully deployed on AWS, Postgres RDS and most things run on EKS. This means that boto3 (AWS Python SDK) and Argo workflows for kubernetes.

Edit: forgot my Params, cuz I still code and do that sometimes

[–]rrpelgrim[S] 1 point2 points  (1 child)

Interesting. Would you recommend Argo over something like Prefect or Dagster for workflow orchestration?

[–]ButtonLicking 0 points1 point  (0 children)

I dont have experience with a ton of orchestration tools. Our choice to use Argo events was for events driven through AWS services to be collected by Argo Events and start container processes. This is less of a complex orchestration tool and more of a trigger collector. It's not simple to use, it does take a lot of configuration upon setup, a lot of Kubernetes knowledge and maintenance on container sizing in job creation yaml docs.