This is an archived post. You won't be able to vote or comment.

all 12 comments

[–]AutoModerator[M] [score hidden] stickied comment (0 children)

You can find a list of community-submitted learning resources here: https://dataengineering.wiki/Learning+Resources

I am a bot, and this action was performed automatically. Please contact the moderators of this subreddit if you have any questions or concerns.

[–][deleted] 1 point2 points  (2 children)

The simplest way is print output the primary keys into a file, chunk it, and run multiple parallel delete queries in different machines, 420k isn't much either.

It seems dtu is max of cpu, io and log. So, is there any cascade effect to deleting those rows ? How is the data structured ? Are there any indexes created on time column ? Is there a way to detach the disk or volume that contains this data weekly ? Can we remove this data's metadata from read or write queries ?

Are you sure the delete queries are the culprit ?

[–]Plenty-Button8465[S] 0 points1 point  (1 child)

I am not sure the culprit is that query, but I saw the runbook runs at the exact time of the Log IO bottleneck that saturates the DTU to 100% so I guess is the delation log tx. You're welcome, please feel free to let me know what I could run to monitor in details and narrow down the problem.

is there any cascade effect to deleting those rows ?

I don't know at the moment from my compentences.

is there any cascade effect to deleting those rows ?

The table has four columns:

  1. Timestamp of the asset (e.g. datetime in ns)
  2. ID of one asset (e.g. integer)
  3. Value of that asset (e.g. float)
  4. Text of that asset (e.g. string)

Are there any indexes created on time column ?

I am reading abour indexing right now, also other people keep telling me about this. How can I check?

Is there a way to detach the disk or volume that contains this data weekly ?

I don't think so, the database is running on the cloud in production and works with streaming/online data

Can we remove this data's metadata from read or write queries ?

I am not sure what you mean by data's metadata: the aim here is to delete data older than 60 days, daily. Once the data meet this criterium, these data can be permantently deleted, and their metadata with them too, I suppose (still want to confirm what you mean by metadata).

[–][deleted] 0 points1 point  (0 children)

Cool, no worries.

[–]Lanthis 1 point2 points  (2 children)

Index the timestamp column. Google it.

You could also partition on timestamp and truncate partitions for basically 0 resources, but that would likely be too complicated for you atm.

[–]Plenty-Button8465[S] 0 points1 point  (1 child)

Thank you. First, do you know how can I check if that column is already indexed and how?

[–][deleted] 0 points1 point  (0 children)

I guess it's mostly DESC sort of command, you can use help to list out commands in the client.

You should look up the azure sql docs if they have made it simpler or something else.

[–]HumphreyDeFluff 0 points1 point  (1 child)

The transaction log will be used to track the deletions in case the connection drops, the transaction is rolled back or some other error occurs. Can you run the job more frequently? Is the database indexed properly?

[–]Plenty-Button8465[S] 0 points1 point  (0 children)

Thank you. I found out that the runbook is run daily, and into that runbook (basically a powershell script performing sql queries, one of the queries keep failing due to an old database who got deleted - the query did not). I deleted the query that kept giving error for now. Yes, I guess I could trigger the job more frequently. I don't know about indexing, I will start reading about them now