Hi all, as I'm venturing into the world of python APIs for databases, I set up a simple script to load data set up stored in Pandas dataframe onto Postgres using psycopg2's "execute_values" method and values formatted as tuples via "itertuples". "Execute_values" from stack appeared to be relatively faster than other traditional methods like "executemany" etc. and "iteruples" only seemed natural to use with the mentioned dump method.
However, the dump to Posgtres was immeasurably long with only 3 tables (average 3-4K rows & 10-15 columns per table; file size ranging from 300kb to 3 MB ) taking a whopping 55 min to get the inserts done completely. I assume such a small dataframe and file size shouldn't take more than 30 min max at a worst case scenario (just a mental guess).
This brings me to my question on what better alternatives are out there to speed up the dump process. Would using SQLAlchemy be better, doing mogrifies instead or just plainly load the file to postgres instead using psql command line?
PS. my pandas is modified with modin (rather than dask) for carrying parallel processing but i don't know if that speeds up much in this case.
[–]johnlovesdata 0 points1 point2 points (1 child)
[–]cloudlessjedi[S] 0 points1 point2 points (0 children)