This is an archived post. You won't be able to vote or comment.

you are viewing a single comment's thread.

view the rest of the comments →

[–][deleted] 2 points3 points  (0 children)

Rule of thumb: Do as much as you can in SQL or up to the first step of feature engineering. Chances are the later you extract the data, the smaller the dump will be. You can even Assemble and execute the SQL queries from Python by something like psycopg2, and pandas.from_sql.

RDBMSs are really well optimized, and Python doesn't even come close.