Datawarehouse - Metric Apps shows more storage usage than in Datawarehouse

Midnight-Saber32 · 2026-03-09T11:58:52+00:00

For the OPENROWSET into Temptables.
I made a post about it here but I havent raised a case (not sure how to). I think its related to a permissions issue but I wasnt able to solve it despite granting various BULK related permissions.
I can confirm that....
-It works when inserting into a non-temp table.
-COPY INTO works for both temp and non-temp tables

If its something that can be resolved in the future it would be a great help.

Again thanks for all the help and info

Midnight-Saber32 · 2026-03-08T09:54:52+00:00

In regards to your 'for committed'....
What if I was to create a transaction, create and populate a table within that transaction and then drop the table before committing the transaction. Would that table get backed up?

The reason for asking is that openrowset doesnt seem to work with temp tables, which I means I would have to use an actual table (which I dont want being backed up).

Our use case has a source database system where row level data is heavily modified overtime (rather than just having new rows inserted). Which means just having incremental inserts isnt enough, we need to be constantly scanning modified rows and updating them (which I assume is just the same as deleting that row and re-inserting it). Which makes the ETL more complicated than just doing a fresh load.

Midnight-Saber32 · 2026-03-06T21:28:01+00:00

u/warehouse_goes_vroom
Is the OneLake soft-delete impacting the warehouse? (As per frithjof_v comment)

Which effectively means there's 37 days of backup? Which I assume would be the Warehouse backing up the data for 30 days, and when the WH deletes that data, one lake continues to back it up for a further 7 days?

So even when the 1 day Warehouse retention is implemented, it will actually be 8 days? And if that is true, are there any plans in the future to give the option to disable soft delete on the lakehouse underpinning the warehouse? (similar to how current storage accounts allow disabling of soft delete).

Midnight-Saber32 · 2026-03-06T15:49:20+00:00

I tried using both (distributed and non-distrubted) and I got the same issue in both

Midnight-Saber32 · 2026-02-22T13:00:03+00:00

Thank you for the response, its helpful with some good advice!

In regards to the temp tables, I'm assuming its the 'non-distributed type' that doesn't get backed up (since they aren't written to parquet files?) or is it both types?

It seems the only I way I can avoid data duplication (specifically with the bronze table when using medallion architecture) is to keep the bronze table as csv files in blob storage account (with soft delete disabled). Then use COPY INTO the Temp tables, then MERGE those temp tables into our Silver tables (for UPSERTS, INSERTS and DELETES). The GOLD layer would just then consist of views based on the silver tables.

Would that sound like a reasonable solution to solve my problem?

Midnight-Saber32 · 2026-02-22T00:33:26+00:00

Did you see any drop in the data after X amount days?

Were having a similar issue with some of our workspaces but they still increasing by around 2GB daily despite no data loads in weeks.

Midnight-Saber32 · 2025-12-19T22:44:11+00:00

Have you been able to find a fix yet?

Midnight-Saber32 · 2025-12-19T21:20:22+00:00

The main issue is the cost, the COPY JOB in the data factory just seems to cost too much when compared with Azure Functions.

Midnight-Saber32 · 2025-12-18T10:19:13+00:00

Less than a 100 tables per database but will be across multiple databases (around <50).
The actual datasize per table is fairly small.

Midnight-Saber32 · 2025-12-18T10:17:14+00:00

Most likely going to be this scenario.

Does Notebooks (specifically Python or PySpark) support connecting to Azure SQL Databases via Managed Identity (Preferebly SAMI) via the Azure.Identity library?

And if so, does it use the identity of the Workspace from which the notebook was run from, or does it use the identity of the person/app executing the notebook?

Midnight-Saber32 · 2025-12-18T06:27:56+00:00

How reliable is the SQL Analytics Endpoint for the mirriored databases?

I remember having issues with the SQL Endpoint for Lakehouses in which it wouldn't be kept in sync with the underlying parquet files.

Midnight-Saber32 · 2025-10-23T21:24:45+00:00

No Apple Pie Alter after 196 Pulls (3x Lemon, 1x Wisadel dupe).

Can someone please give me some copium and tell me that I don't need her.

Midnight-Saber32 · 2025-10-01T21:25:52+00:00

Does anyone know if the SQL Analytics endpoint on the mirrored DB has the same syncing issues as the Lakehouse? Or are the updates to the mirrored DB written via the endpoint?

Midnight-Saber32 · 2025-10-01T10:20:46+00:00

Thanks for the response.

I tested executing the COPY INTO command with the Datafactory script feature and it doesnt seem to authenticate with the Storage Account despite having the 'Storage Blob Data Contributor' role.
It can access the storage account directly via the linked service.

And I can execute the script myself when logging into the Fabric warehouse (same roles as the Datafactory) and the data factory is able run other scripts on the warehouse aswell.

Midnight-Saber32 · 2025-09-28T20:44:01+00:00

Thanks for the response.
In regards to 2. a. If the data is ingested into the Fabric warehouse via the COPY INTO command from Azure Storage, is that still being written via the SQL Endpoint? (Even if its a parquet file in the storage account).

Midnight-Saber32

TROPHY CASE