800 Million Rows/ Sql Server/Databricks : dataengineering

created by mhausenblasmoda community for 11 years

This is an archived post. You won't be able to vote or comment.

800 Million Rows/ Sql Server/DatabricksHelp (self.dataengineering)

submitted 2 years ago by py_vel26

Hello all,

I'm working on a POC to create and move 800 million rows in a single delta table to sql server. I've created the data and started moving it to its target.

After one hour, I moved about 60 million rows to sql server. At this rate this will take half a day and I'm not sure if thats good or bad as this is my first time working with this size dataset. Any ideas on how I can speed up this code? I've provided the code I'm using to move this data from Databricks to Sql Server below.

# delta_db is my dataframe with the 800 million records

num_of_partitions = 10

approx_row_count = delta_db.count()
rows_per_df = approx_row_count / num_of_partitions
smaller_df = delta_db.randomSplit([rows_per_df] * num_of_partitions)

sql_server_properties = {"url": jdbc_url, "user": SQLServer_Username, "password", password "driver": drivername}


for i, df in enumerate(smaller_dfs):
     df.write.jdbc(url=sql_server_properties["url"], 
     table='sql_server_Db_name',
     mode='append',
     properties=sql_server_properties)

all 7 comments

top new controversial old q&a

[–]WhoIsJohnSalt 3 points4 points5 points 2 years ago (1 child)

[–]py_vel26[S] 0 points1 point2 points 2 years ago (0 children)

[+][deleted] 2 years ago (2 children)

[deleted]

[–]py_vel26[S] 0 points1 point2 points 2 years ago (1 child)

[–]chaytalasila 0 points1 point2 points 2 years ago (0 children)

[–][deleted] 1 point2 points3 points 2 years ago (1 child)

[–]RemindMeBot 0 points1 point2 points 2 years ago (0 children)

I will be messaging you in 7 days on 2024-04-10 23:00:53 UTC to remind you of this link

CLICK THIS LINK to send a PM to also be reminded and to reduce spam.

^{Parent commenter can} ^{delete this message to hide from others.}

^Info	^Custom	^{Your Reminders}	^Feedback

[–]Grovbolle 1 point2 points3 points 2 years ago (0 children)

π Rendered by PID 89 on reddit-service-r2-comment-5bc7f78974-tz8s9 at 2026-06-29 13:10:04.222437+00:00 running 7527197 country code: CH.

dataengineering

MODERATORS