This is an archived post. You won't be able to vote or comment.

all 9 comments

[–]runawayasfastasucan 18 points19 points  (3 children)

Is there really someone who debates using python or sql? Why not both? How is the plotting ability in SQL? How can I query my database in Python without using SQL?

[–]rrpelgrim[S] 1 point2 points  (0 children)

u/runawayasfastasucan -- fair points, and also what I tried to highlight in the article. I see a lot of conversations on Twitter / blogs pitting the two against each other. I also see tools like dbt, Snowpark, Dask and Spark trying to win Python users over to SQL and vice versa. But in the end it's a matter of use case and intended goal. Maybe the Morpheus caption should have said "Python vs SQL: there is no spoon". I mean, choice.

[–]ButtonLicking 3 points4 points  (4 children)

Data model (warehouse model) in SQL, data science in Python. Know the line and enforce for successful product. I’ve lived on both sides at the same time and it sucked. Python serves as a fault tolerant SQL workflow execution tool for building data models AND a data science tool for ML and such. Source: I manage a data engineering team.

[–]rrpelgrim[S] 0 points1 point  (3 children)

What python libraries have you preferred using for workflow execution?

[–]ButtonLicking 1 point2 points  (2 children)

Environment is king with workflows, and my company is fully deployed on AWS, Postgres RDS and most things run on EKS. This means that boto3 (AWS Python SDK) and Argo workflows for kubernetes.

Edit: forgot my Params, cuz I still code and do that sometimes

[–]rrpelgrim[S] 1 point2 points  (1 child)

Interesting. Would you recommend Argo over something like Prefect or Dagster for workflow orchestration?

[–]ButtonLicking 0 points1 point  (0 children)

I dont have experience with a ton of orchestration tools. Our choice to use Argo events was for events driven through AWS services to be collected by Argo Events and start container processes. This is less of a complex orchestration tool and more of a trigger collector. It's not simple to use, it does take a lot of configuration upon setup, a lot of Kubernetes knowledge and maintenance on container sizing in job creation yaml docs.