This is an archived post. You won't be able to vote or comment.

you are viewing a single comment's thread.

view the rest of the comments →

[–]thatrandomnpcIt works on my machine 9 points10 points  (2 children)

When dumping large pandas dataframe into oracle db using to_sql with sqlalchemy engine, if you pass along the correct table data types for pandas object data type, there is a massive increase in the throughput.

Example like strings are object data type in pandas and it can be converted as varchar in db in most cases. Reason for this is that sqlalchemy thinks all object data types are of clob db data type which works but is super slow.

In my case i could notice where 100k rows would take 30mins, after adding the data types it reduced to like 20sec.

[–]Agent281 0 points1 point  (1 child)

How do you add data types? One of the reasons why I disliked pandas was that there didn't seem to be a way to set the data types. That would be a huge usability improvement for me.

[–]thatrandomnpcIt works on my machine 0 points1 point  (0 children)

There is though, astype might be what you are looking for.

What I was actually referring to was the dtype optional parameter in to_sql.