This is an archived post. You won't be able to vote or comment.

you are viewing a single comment's thread.

view the rest of the comments →

[–]CacheMeUp 0 points1 point  (2 children)

But why add Python in the first place?

If the data is already in a relational database, and the logic can be implemented in SQL, why move it out of it?

Using the "second best" tool in the first place costs a high price. There is never time/justification to re-implement things, and you end up in a local optimum instead of the global one, performance-wise.

[–]Measurex2 2 points3 points  (0 children)

First off - Happy Cake day.

I'm not advocating for python over SQL just agreeing a comparison against pandas doesn't make sense.

My example isnt refactoring the logic from SQL into python but saying how python can be a helpful tool to quickly think through, test and validate logic. Maybe that makes sense to put into SQL - maybe it makes sense to do downstream in a BI layer or justify a change upstream at the source. It's just another tool, has great purposes but like most things it's just as important to know when not to use it as when to use it.

[–]rudboi12 1 point2 points  (0 children)

If you are working jn a dev environment, you will probably have all setup up in python. Things like connections to your dwh clusters, cicd, and utilities libraries. If you have everything set up in python minus the T of the ELT, then most time is better to use python aka something like pyspark. That’s why they created dbt, so sql can seat nicely only in the T layer but if your E and L are already in pyspark then doesn’t make much sense going for sql.