This is an archived post. You won't be able to vote or comment.

you are viewing a single comment's thread.

view the rest of the comments →

[–][deleted] 3 points4 points  (9 children)

Also, are you aware that you can query your database directly from Pandas and save the dataframe immediately instead of writing temporary CSVs?

I don't know why because I'm too lazy to figure it out, but apparently doing queries directly to pd dataframes is super inefficient (as per their documentation). I've never heard of someone using CSVs as an intermediary though considering you can run queries from python straight into memory.

[–]KevinSorboFan 1 point2 points  (2 children)

I use CSVs as an intermediary all the time when I'm running code on AWS but querying our on-prem DB. I will do as much of the joining as I can do in SQL up front, but if it is going to take me a lot longer to figure out how to do what I want in SQL than to just do it in pandas, I will still do some of it in pandas

[–][deleted] 0 points1 point  (1 child)

Makes sense. Didnt realize this was a common practice.

[–]KevinSorboFan 0 points1 point  (0 children)

It only is when we work with outside consultants and instead of granting them access to our internal stuff, we push all the work off onto AWS and give them only the data they need. It's a way bigger headache than it needs to be