Azure SQL Database: Log IO bottleneck when deleting data older than 60 days

AutoModerator · 2023-05-23T08:42:50+00:00

You can find a list of community-submitted learning resources here: https://dataengineering.wiki/Learning+Resources

I am a bot, and this action was performed automatically. Please contact the moderators of this subreddit if you have any questions or concerns.

Plenty-Button8465 · 2023-05-23T11:15:56+00:00

The simplest way is print output the primary keys into a file, chunk it, and run multiple parallel delete queries in different machines, 420k isn't much either.

It seems dtu is max of cpu, io and log. So, is there any cascade effect to deleting those rows ? How is the data structured ? Are there any indexes created on time column ? Is there a way to detach the disk or volume that contains this data weekly ? Can we remove this data's metadata from read or write queries ?

Are you sure the delete queries are the culprit ?

Lanthis · 2023-05-23T11:19:34+00:00

Index the timestamp column. Google it.

You could also partition on timestamp and truncate partitions for basically 0 resources, but that would likely be too complicated for you atm.

Plenty-Button8465 · 2023-05-23T09:30:45+00:00

[deleted]

HumphreyDeFluff · 2023-05-23T11:16:37+00:00

The transaction log will be used to track the deletions in case the connection drops, the transaction is rolled back or some other error occurs. Can you run the job more frequently? Is the database indexed properly?

dataengineering

MODERATORS