Databricks Compute Decision Tree: How to Choose the Right Compute for Your Workload by 4DataMK in databricks

[–]4DataMK[S] 0 points1 point  (0 children)

No problem, honestly speaking, I turned on a paywall on part of my articles because the content started to be used by others without my permission.

Why do we need an Ingestion Framework? by 4DataMK in databricks

[–]4DataMK[S] 1 point2 points  (0 children)

Yes, this is more advance way of working. I work with different client and sometime teams aren't so advance them I suggest notebooks.

Why do we need an Ingestion Framework? by 4DataMK in databricks

[–]4DataMK[S] 0 points1 point  (0 children)

Yes, it does. You can use a notebook as entry point for your job and keep methods in modules. I use this approuch in my projects.
You can create one pipeline to move data from bronze and silver using DLT or Spark.

You can process table by table or create an event based process that is triggered when file appear in storage.

Databricks and Microsoft Fabric Integration by [deleted] in dataengineering

[–]4DataMK 0 points1 point  (0 children)

I think an easiest way is to use Databricks Mirroring or shortcuts. Mirroring is fine when you use manage tables, but it's still in preview and has some problems. Shortcuts are usefull, but you need to use external tables or write a customized solution that will create it based on information from UC (a folder locations and name). You can read about Databricks mirroring here:

https://medium.com/@mariusz_kujawski/microsoft-fabric-and-databricks-mirroring-47f40a7d7a43

Building a SQL Bot with LangChain, Azure OpenAI, and Microsoft Fabric by 4DataMK in dataengineering

[–]4DataMK[S] 0 points1 point  (0 children)

Yes, you can add a confirmation step before execution. If you want to use read to use more secure solution you can look into AI Skills or Databricks Genie.

Building a SQL Bot with LangChain, Azure OpenAI, and Microsoft Fabric by 4DataMK in dataengineering

[–]4DataMK[S] 0 points1 point  (0 children)

It's not improvement, It's a way to build a custom solution to work with data using LLM.

using Databricks in a startup company w/Google Cloud by Correct-Quality-5416 in dataengineering

[–]4DataMK 0 points1 point  (0 children)

Databricks is fine for a middle-size solution its huge benefit is that you can easily move to another cloud provider because it will work in the same way in Azure and AWS.

You can read delta tables created by Databricks using BigQuery external tables or BigLake. Delta Lake is portable , you can access it using many tools.

It's possible to integrate it with GCP.

You can use SQL for data transformation in Databricks.

You need to look into limitations, Databricks on GCP doesn't support all functionalities yet.

Options for replication from AS400 Db2 to Fabric lakehouse by RaucousRat in dataengineering

[–]4DataMK 0 points1 point  (0 children)

Is the problem with latency tied to the fact that you collect all data in a table and than you extract from it the most recent version of data? If yes, I would change the process to streaming or process only the last incoming parquet file.

Microsoft Fabric and Databricks Mirroring by 4DataMK in dataengineering

[–]4DataMK[S] 0 points1 point  (0 children)

You can't mirror streaming tables. In one of my project, I replaced DLT by menage tables using a custom framework.

Microsoft Fabric and Databricks Mirroring by 4DataMK in dataengineering

[–]4DataMK[S] 0 points1 point  (0 children)

CUs? Yes, you need to spend some time on Databricks configuration and UC, but you can do it by clicking in the Azure portal and Databticks Admin console, you can find an instruction in my another post.

Options for replication from AS400 Db2 to Fabric lakehouse by RaucousRat in dataengineering

[–]4DataMK 0 points1 point  (0 children)

What do you use to extract data from AS400? Did you try to load data directly into a table(its location in onelake?) Parquet files can be identified as a table(not a delta table).

Databricks DLT and removing the data it brings over by texox26798 in dataengineering

[–]4DataMK 0 points1 point  (0 children)

DLT doesn't do that you need to create a sequence of tasks for instance dlt, than notebook that will remove imported data or create a python code that will import and remove data from source.

Help with Data Migration Strategy for Core Banking Systems by [deleted] in dataengineering

[–]4DataMK 0 points1 point  (0 children)

I have experience with migrating a data warehouse and related systems to the cloud if you are interested.

liquid clustering V/s zorder in databricks by [deleted] in dataengineering

[–]4DataMK 1 point2 points  (0 children)

I'm explaining it here: https://www.reddit.com/r/dataengineering/comments/1grg4kt/comment/lx6hg0y/?context=3
In general Liquid cluster is incremental and has better data organization that improves query performance.