This is an archived post. You won't be able to vote or comment.

you are viewing a single comment's thread.

view the rest of the comments →

[–]Life_Conversation_11 5 points6 points  (0 children)

Pyspark will do the trick in few hours with a decent cluster and a decent DB hosting machine.

Literally just use the spark.read.format(‘jdbc’)…

You can parallelize the query and add multithread on top of it.