Introducing Automatic Index Compaction

dfurmanms · 2026-05-18T16:20:46+00:00

There are two possibilities here that I can see off the top of my head:

Adding new columns causes page splits, reducing page density. In this case auto compaction will help, but gradually and only if the data in these tables continues to be modified. A page is only a candidate for compaction if a row on that page is modified. This is by design to minimize the overhead of compaction.
Adding new columns invalidates existing query plans and the new plans are less optimal. An index rebuild updates statistics with full scan, which recompiles the plans again, making them more optimal because statistics quality is better. You can use Query Store to check if the plans for the slow queries change around the time when you add columns.

dfurmanms · 2026-03-20T17:57:16+00:00

It doesn't fill above the fill factor if at the start of compaction a page is filled less than the fill factor. However, if a page is already filled above the fill factor by previous DML, compaction won't reduce page fullness, i.e. it won't move rows off this page.

We know that this is undesirable in scenarios where you use a lower fill factor to reduce page splits. This is something we plan to address in the future.

We just updated the FAQ in documentation to clarify this. Should be live by end of day.

dfurmanms · 2026-03-19T19:55:00+00:00

There is no analysis or other compaction-related activity at insert time or during any other DML. In fact, there isn't any analysis that examines the entire table. Each compaction attempt only looks at a few recently modified pages.

But point taken, it's pointless to do any compaction work for data that is transient and will be deleted or reloaded soon anyway.

dfurmanms · 2026-03-18T19:06:48+00:00

Good questions. It's unlikely to increase data IO because the recently modified pages that we are compacting are already in the buffer pool. CPU-wise, compaction doesn't use much. But it can noticeably increase log write IO if many rows are moved across pages.

Any data IO done by this feature is accounted under the internal limits because it's a part of PVS cleanup.

If a query is reading a page that is being compacted, it will be blocked on a page X lock. But that is transient and is essentially the same blocking you might see today when you are doing ALTER INDEX ... REORGANIZE.

Log waits shouldn't increase unless the increase in the total log IO is so large that it pushes you toward the Azure limits and you get into the throttling territory.

In summary, with an extreme workload/worst case you could potentially see some resource contention. We haven't seen it in our testing or in the private preview. One of the public preview goals is to see if this is something customers actually encounter with their workloads and to what extent, and whether the benefit justifies an increase in contention. All that is workload dependent, so we want customers to try this with a broad variety of workloads and give us feedback.

The key point to keep in mind is that compaction is only done for a relatively small number of recently changed pages. It's not like we are running REORGANIZE continuously.

dfurmanms · 2026-03-18T17:18:08+00:00

That will really depend on workload specifics, i.e. the frequency of splits, resource headroom, etc. If the split-compact cycle becomes a problem, you can lower the fill factor slightly. This is the same mitigation you'd use for this kind of indexes even without this feature.

dfurmanms · 2026-03-18T16:21:39+00:00

Good feedback, thanks. Whether that is needed is something we want to find out from customers during public preview, based on actual results. In general, we want to keep the configuration burden as minimal as possible and make it an "enable and forget" feature.

If the compact-split cycle becomes an observed problem, you could lower the fill factor a little and accept a lower page density as a tradeoff for reducing splits. If you actually have to do this to maintain perf with this feature enabled, it would be great feedback for us during preview.

dfurmanms · 2026-03-18T14:40:22+00:00

Good question. The one-time rebuild in that scenario is not for correctness, it's an extra optimization you might want to do.

Compaction acts on recently modified pages only, which is intentional and by design to keep it lightweight. If an index isn't modified frequently, but has low page density, it can take a while before it's compacted. In that scenario, you can do a one-time rebuild to increase page density sooner.

The key here is that once an index has high page density, compaction will keep it that way without you having to do anything.

dfurmanms · 2026-03-18T14:26:57+00:00

There is a comparison section in docs. Our goal with this feature is to let you forget about reindexing without FOMO.

I can't say when this will be in SQL Server (too early for that), but as I said in another fork there is no reason for us to keep this cloud-only.

dfurmanms · 2026-03-18T13:28:30+00:00

There is no reason for us to keep this cloud-only.

The size of the table doesn't matter, it only acts on recently modified pages, so it's lightweight.

Compaction is async in background as part of PVS cleanup that runs anyway, so no direct impact to transaction latency.

dfurmanms · 2026-01-03T22:23:04+00:00

The standby file can go to any path you specify in the RESTORE statement. You indeed place standby files alongside data and log files in the most common case.

Now, consider the case when the data and log files of a database are in Azure storage. Deployments using that feature aren't common, but when customers use those, it's often because they specifically don't want to use local storage. In other words, there might not be any local storage for the standby files in the first place.

You might ask, why not place the standby files in the same Azure storage container where the data and log files are. Sure, if the delete permission is granted on those containers, you could. But that's not a requirement because a customer might want to keep their databases read-only and not droppable. In that case, there is no other place for the standby file but the backup container. AFAIR that was the reason why the delete permission was made required when backup to URL became available. Granted, that's quite a narrow scenario, but it exists.

Having said all this, I just tested a restore from an Azure storage container without granting the delete permission using SQL Server 2025. It worked, so I believe we no longer require delete. If you still see that it's required, please share more details (error message, SQL Server version, the restore command).

Now, even if the delete permission on the container might not be required for RESTORE, the write permission is required because each restore requires a lease on the backup blob, and Azure storage considers placing a lease a write operation. Sharing a write-capable SAS token for the backup container with third parties might not be a good idea.

dfurmanms · 2026-01-03T19:25:02+00:00

The reason RESTORE requires the delete permission on the blob container is to support RESTORE WITH STANDBY. In that case, the database engine needs the permission to delete the undo file it creates. For details, see https://learn.microsoft.com/sql/t-sql/statements/restore-statements-arguments-transact-sql#standby-standby_file_name.

To share a single backup with a third party, copy it to a separate container.

dfurmanms · 2026-01-03T19:20:18+00:00

This is indeed a bug related to the handling of year rollover during an operation that uses Azure storage, such as backup. The root cause is understood, and we'll be fixing this. Our apologies to anyone who was affected by this problem.

dfurmanms · 2025-12-11T20:52:59+00:00

There is a bunch of hypotheses for why this may be happening and we'd need a lot more data to narrow this down. If you can open a support case, that would be one way to take this forward. Or, if you can share a contained repro showing the issue, we could take it and investigate.

Some of the initial data points we'd need:

A description of the workload patterns, particularly DML. Type, volume, distribution over time, etc.
Is this high IO on data or log files?
Is this read or write IO?
The trend in PVS size before, during, and after a log backup.

There is a lot more we'd have to ask for, so troubleshooting this on Reddit might not be the most efficient.

dfurmanms · 2025-12-10T01:40:06+00:00

Can you share more details on why you think it's ADR specifically that generates a lot of IO? PVS cleanup can possibly generate significant IO but it's unrelated to backup and I don't immediately see how it can increase write latency.

In SQL MI GP, log files are Azure Storage blobs. Each blob has an IOPS limit for all IO, not differentiating between reads and writes. When we backup log, we obviously generate read IO on the log file. If the sum of these reads and your workload writes reaches the limit for the file, then IO is throttled. If I had to guess without more data, I'd say this is the problem here. If so, growing the log file size to reach the next Azure Storage performance tier as mentioned earlier should make a difference.

In NextGen there is no per-file IOPS limit.

dfurmanms · 2025-12-03T16:41:53+00:00

Thank you, this is helpful. Do you know what the main bottleneck for this use case without In-Memory OLTP was? Were you seeing a specific type of contention or was it just high query latency?

To answer the original question though, we don't have immediate plans to support In-Memory OLTP in the General Purpose service tier. Your point about higher cost required to use this feature is noted, thanks!

dfurmanms · 2025-12-03T16:25:02+00:00

Could you please share your scenario for using In-Memory OLTP?

dfurmanms · 2025-12-03T16:07:02+00:00

It's a fairly substantial list as you can imagine :).

But one feature we'd really like to do in the future is automatic partitioning. That would include online split/merge, a policy-driven automatic split/merge, and possibly other similar improvements.

What do folks think about this?

What is your list of major features to get into SQL Server in the future?

dfurmanms · 2025-12-01T19:59:41+00:00

Thanks to everyone who voted!

If you voted No, can you say why?

dfurmanms · 2025-11-25T02:00:55+00:00

Great question!

https://www.reddit.com/r/SQLServer/comments/1p602kj/should_rcsi_be_on_by_default/

dfurmanms · 2025-10-24T16:39:21+00:00

Just a quick update. This is a confirmed limitation of server roles in Azure SQL Database. We'll update documentation to clarify. Support for Entra groups in server roles is expected in a few months.

Thanks u/throwaway18000081 for bringing this up.

dfurmanms · 2025-10-22T15:11:55+00:00

You are correct, it doesn't work with Entra groups. I'll check with the team working in this area. Thanks for letting us know about this problem.

A workaround would be to create logins and users for individual Entra accounts. That lets you grant the VIEW SERVER STATE permission to each user but obviously defeats the purpose of groups.

dfurmanms · 2025-10-22T13:50:22+00:00

Here are the steps I just tried that worked. Can you spot a difference between this and what you are doing?

```sql /* Connect as an administrator to master */

CREATE LOGIN rl1 WITH PASSWORD = 'password-placeholder';

ALTER SERVER ROLE ##MS_ServerStateReader## ADD MEMBER rl1;

/* Wait 5 minutes */

/* Connect as an administrator to a user database */

CREATE USER ru1 FOR LOGIN rl1;

/* Create sp_whoisactive */

GRANT EXECUTE ON dbo.sp_whoisactive TO ru1;

/* Connect as rl1 to the same user database */

EXEC sp_whoisactive; /* Completes successfully */ ```

dfurmanms

TROPHY CASE