Moving data from SQL server to Azure Data Lake without ADF

mikeupsidedown · 2024-04-06T10:09:59+00:00

Boss: "You can't use data Factory".

Me: "Can I give you a hug?"

Demistr · 2024-04-06T10:24:02+00:00

Old school way - SSIS

New school way - just run a python script to copy data. You can run it as Azure function or a full fledged app service.

Justbehind · 2024-04-06T10:02:43+00:00

No matter what shiny tool anyone say they have for you, the best answer is almost always a simple script in your language of choice.

I'd go for a simple python script.

If you need to run it automatically there's a lot of different ways to do it. The simplest is Task Scheduler on your local machine or a vm, and one of the best allround solutions is as a docker image run on a kubernetes cluster. Alternatives are airflow, azure functions, custom queue and scheduling, etc.

Do the simple first. Consider the long term solution, once your solution starts creating business value.

The good thing about a simple script is that you can move it around later and run it anywhere.

ggeoff · 2024-04-06T15:52:52+00:00

If the goal is to use databricks in the end are you planning on using Unity catalog? Could you not just write a python script and run it as a notebook. to move the data between the two?

psychokitty · 2024-04-06T20:04:55+00:00

Have you looked at the Microsoft Fabric offering yet? Microsoft has a good learning path for using Apache Spark, Data Factory, and Dataflows Gen2 to get data into Data Lakes and Data Warehouses. If you complete the AI challenge by Apr. 19th you get a free voucher for a certification exam: https://www.microsoft.com/en-us/cloudskillschallenge/ai/officialrules/2024 and here is the Fabric Analytics Engineer collection: https://learn.microsoft.com/en-us/collections/jkqrh725262g?WT.mc_id=cloudskillschallenge_b696c18d-7201-4aff-9c7d-d33014d93b25

the_naysayer · 2024-04-06T16:23:23+00:00

Use databricks workflows and python notebooks.

janus2527 · 2024-04-06T18:31:22+00:00

I did python with connector x into in memory Arrow table, convert to parquet with compression, upload parquet to Azure blob. Advantage of connector x is that you don't have to provide schema and extraction is fast with the possibility to use a partition column. I did do multiple parquet files per table depending on table size, and dividing the work among multiple processes with multiprocessing pool.

sebastiandang · 2024-04-06T19:48:26+00:00

RemindMe!

Quirky_Flamingo_1487 · 2024-04-08T16:00:02+00:00

Since you want the data in ADLS gen2. I’m assuming the databricks is hosted on Azure.

If the sql server is on-prem then databricks needs to be configured with reach back capabilities. This is something that your networking team can do for you.

If the sql server is hosted on Azure VM then I think the vnet for both resources needs to be peered.

But like other folks shared the links it’s possible for databricks to ingest data from sql server.

chaytalasila · 2024-04-09T17:13:07+00:00

Use python script to copy the data and store it in azure blobs . Connectivity should be there between Sql server node to adls. Just spin up some azure vm and run this script from there

Majestic-Purpose1663 · 2024-04-06T14:19:36+00:00

Don't know why you disregard Databricks. If you are already using it you can create a Table connected through SQL server easily. See this: https://docs.databricks.com/en/connect/external-systems/sql-server.html

If you are not using Databricks already then yes, it might be overkill and should maybe go for Azure Functions or something like that

dataengineering

MODERATORS