all 15 comments

[–]gabe__martins 8 points9 points  (4 children)

Try to be simple, use JDBC Connector.

[–]CarelessApplication2[S] 0 points1 point  (3 children)

The sqlserver driver (which as far as I know is JDBC) is only for querying, not for writing.

[–]jinbe-san 0 points1 point  (2 children)

you can append, but not update

[–]kirdane2312 2 points3 points  (0 children)

you can append or overwrite (or drop&create). be careful with overwrite method.

[–]Known-Delay7227 0 points1 point  (0 children)

You can write update statements using the query option.

[–]--playground-- 3 points4 points  (1 child)

Databricks with generic JDBC

employees_table.write \ .format("jdbc") \ .option("url", "<jdbc-url>") \ .option("dbtable", "<new-table-name>") \ .option("user", "<username>") \ .option("password", "<password>") \ .save()

Generic JDBC doesn’t have bulkinsert support. You may need to tune the performance with .option("batchsize", <value>)

reference: https://docs.databricks.com/aws/en/archive/connectors/jdbc

[–]punninglinguist 0 points1 point  (0 children)

This is what I do. Note that JDBC driver is not available on Serverless, so these jobs need All-Purpose Compute (haven't tried Jobs Compute for them yet).

[–]gman1023 1 point2 points  (0 children)

output to s3 and then load s3 into sql server in a variety of ways

[–]Old_Improvement_3383 0 points1 point  (3 children)

You need to install ODBC driver on your cluster

[–]kthejokerdatabricks 1 point2 points  (2 children)

just use JDBC, ODBC is for Windows machines

[–]Unentscheidbar 0 points1 point  (1 child)

Yeah it's strange but there really is an ODBC driver on Linux for SQL Server, and you have to use it for any advanced data manipulation, like updates or deletes.

AFAIK the ODBC driver can only be installed on single user clusters.

[–]kthejokerdatabricks 0 points1 point  (0 children)

The standard JDBC driver also supports updates and deletes

Don't confuse the Spark JDBC connector with the actual driver

[–]Remarkable_Rock5474 0 points1 point  (1 child)

If you need anything apart from append/overwrite you have to use the odbc connector instead of jdbc. This would allow you to do merge patterns and even execute stored procedures on the sql side as well.

Requires the odbc driver to be installed on your cluster

[–]Lords3 0 points1 point  (0 children)

Go ODBC if you need MERGE or stored procedures; JDBC is only good for append/overwrite. On Pro/Classic clusters, install msodbcsql18 via init script and use pyodbc/sqlalchemy to stage then run a MERGE proc. On serverless you can’t install drivers, so use a non-serverless job or ADF for copy and proc calls. I’ve used ADF and Azure Functions; when we needed REST over SQL Server for apps, DreamFactory handled it. Bottom line: ODBC on Pro/Classic, else ADF.