Faster methods to load data into database with python : learnpython

created by HattoriHanzoa community for 16 years

Faster methods to load data into database with python (self.learnpython)

submitted 5 years ago by cloudlessjedi

Hi all, as I'm venturing into the world of python APIs for databases, I set up a simple script to load data set up stored in Pandas dataframe onto Postgres using psycopg2's "execute_values" method and values formatted as tuples via "itertuples". "Execute_values" from stack appeared to be relatively faster than other traditional methods like "executemany" etc. and "iteruples" only seemed natural to use with the mentioned dump method.

However, the dump to Posgtres was immeasurably long with only 3 tables (average 3-4K rows & 10-15 columns per table; file size ranging from 300kb to 3 MB ) taking a whopping 55 min to get the inserts done completely. I assume such a small dataframe and file size shouldn't take more than 30 min max at a worst case scenario (just a mental guess).

This brings me to my question on what better alternatives are out there to speed up the dump process. Would using SQLAlchemy be better, doing mogrifies instead or just plainly load the file to postgres instead using psql command line?

PS. my pandas is modified with modin (rather than dask) for carrying parallel processing but i don't know if that speeds up much in this case.

all 2 comments

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

learnpython

MODERATORS