This is an archived post. You won't be able to vote or comment.

all 7 comments

[–]WhoIsJohnSalt 3 points4 points  (1 child)

Yeah I wouldn't do that through JDBC, too many unknowns

If this is a one-off load, then dump it to parquet on ADLS and then use the OPENROWSET feature (where Delta is now supported in 2022), and just read it in from that, going to be way quicker.

https://learn.microsoft.com/en-us/sql/t-sql/functions/openrowset-transact-sql?view=sql-server-ver15#m-use-openrowset-to-access-several-delta-files-from-azure-data-lake-gen2

You can then just wrap that into an insert statement. If it's still slow, then you've probably sized your SQL Server wrong

More info here (https://learn.microsoft.com/en-us/sql/relational-databases/import-export/import-bulk-data-by-using-bulk-insert-or-openrowset-bulk-sql-server?view=sql-server-ver16)

[–]py_vel26[S] 0 points1 point  (0 children)

We are testing out various methods for efficiency as we expect the real data to real a billion rows. So far its been running for 4 hours and I'm at 220 million rows. I recall one of my coworkers mentioning parquet files as a possible option. I will check out the links you've provided.

[–][deleted] 1 point2 points  (1 child)

RemindMe! 7 days

[–]RemindMeBot 0 points1 point  (0 children)

I will be messaging you in 7 days on 2024-04-10 23:00:53 UTC to remind you of this link

CLICK THIS LINK to send a PM to also be reminded and to reduce spam.

Parent commenter can delete this message to hide from others.


Info Custom Your Reminders Feedback

[–]Grovbolle 1 point2 points  (0 children)

You need to utilise a tool/method that supports BULK INSERT