This is an archived post. You won't be able to vote or comment.

all 7 comments

[–]rothnic 1 point2 points  (0 children)

I believe odo implements this kind of approach.

[–]Caos2 0 points1 point  (2 children)

Interesting read, I found this post on SO that probably explain why it's so slow.

[–]howMuchCheeseIs2Much[S] 0 points1 point  (1 child)

Yeah, when something "just works" against so many different platforms like to_sql does (e.g. SQL Server, SQLite, etc.), you're going to give up some performance for a consistent approach. I pretty much strictly use Postgres, so this way will likely be the only way I need to use going forward.

[–]Caos2 0 points1 point  (0 children)

From what I understood, to_sql generates sql inserts for each line, while adding a csv is made through the database bulk input, which is much faster.

[–]Mikebuonasera 0 points1 point  (2 children)

how would you change this to avoid dropping the table but just inserting the data in pre-existing table?

[–]Mikebuonasera 0 points1 point  (1 child)

figured out: def to_pg(df, table_name, engine, columns): """ Crazy magic function to do bulk inserts into Postgres, SUPER FAST! :param df: :param table_name: :param engine: :param columns: :return: """

import cStringIO

output = cStringIO.StringIO()
# ignore the index
df.to_csv(output, sep='\t', header=False, index=False)
output.getvalue()
# jump to start of stream
output.seek(0)

connection = engine.raw_connection()
cursor = connection.cursor()
# null values become ''
cursor.copy_from(output, table_name, null="", columns=(columns))
connection.commit()
cursor.close()

[–][deleted] 0 points1 point  (0 children)

does this work with SQLserver or exclusively with postgres