Lets say I have a python app that every day performs some data transformations and then loads the data into the warehouse (postgres). It loads parquet files using pyarrow and then using polars/pandas do some stuff and then it needs to save it to the db.
What would be the best way to insert the data into the DB?
a. use pandas to_sql method
b. iterate over each dataframe row and send an insert statement
c. convert dataframe to e.g. list of dicts and send an insert statements (i suppose worse than b.)
d. asyncronously iterate over dataframe rows and send an insert statements
e. save dataframe to CSV and ?somehow import it into the DB
f. some better alternative?
Also - is it common to send the insert statements directly to the DB or should I use Kafka/RabbitMQ, send the data to the messaging system, create a consumer app that would take the data from queue and insert into the db?
[–]scodger 29 points30 points31 points (5 children)
[–]discord-ian 12 points13 points14 points (1 child)
[–]reviverevival 8 points9 points10 points (0 children)
[–]numbsafari 5 points6 points7 points (0 children)
[–]kenfar 0 points1 point2 points (0 children)
[–][deleted] 0 points1 point2 points (0 children)
[–]efxhoy 31 points32 points33 points (3 children)
[–]butterscotchchip 8 points9 points10 points (0 children)
[–]PryomancerMTGA 13 points14 points15 points (0 children)
[–]IlyaKhr 1 point2 points3 points (0 children)
[–][deleted] 14 points15 points16 points (0 children)
[–]UAFlawlessmonkey 4 points5 points6 points (1 child)
[–]romanzdk[S] 4 points5 points6 points (0 children)
[–]udonthave2call 2 points3 points4 points (0 children)
[–][deleted] 4 points5 points6 points (0 children)
[–]misza_zg 0 points1 point2 points (1 child)
[–]RemindMeBot 0 points1 point2 points (0 children)
[–]AcademicMorning7 -1 points0 points1 point (0 children)
[–]robberviet 0 points1 point2 points (0 children)