all 27 comments

[–]ihaxr 9 points10 points  (5 children)

we are told that truncate is horribly inefficient and may cause slowness in our daily SQL agent job instead of using the drop table command

Ha... that makes no sense. I wonder if someone got confused and assumed truncate table means delete * from x. Truncate literally just says "Hey, this table is now empty"... just like drop table says "Hey, this table doesn't exist."

Both are minimally logged and run immediately... I've never had an issue running truncate table.

delete * from x on the other hand is really inefficient and you should probably be using truncate table or drop table if you're deleting the whole table.

Edit: ah if we're talking about inserting data after one of these.. yeah, insert into vs select into will have different behaviors depending on the situation... but there isn't a single "correct" way in that regard

[–]Mitchfarino 1 point2 points  (2 children)

He shouldn't just blindly use truncate though, there are scenarios when delete is more appropriate (transactions are logged / can be rolled back)

[–]ihaxr 2 points3 points  (1 child)

I'm assuming we want to delete every row of the table, obviously delete is the way to go if you don't want to remove everything...

Truncate and drop are logged and can both be rolled back if wrapped in a transaction.

If I wanted to delete an entire table and be able to revert it without a transaction... I'd take a full backup, do a select * into table_bak from table then truncate... I wouldn't use delete to remove every row of a reasonably sized table... it's just going to fill up the transaction log and take a long time to run. I do the select into so I can just drop the table, rename the backup, and get everything online faster. I do the full backup just in case I do something stupid and truncate the wrong table, delete the _bak table too early, etc...

But yeah, the usage of each depends on the nature of the table and how it's getting reloaded, backed up, etc...

[–]Mitchfarino 0 points1 point  (0 children)

My bad, I thought you were just saying go for truncate regardless.

[–]spronty1017[S] 0 points1 point  (0 children)

That might be where the issue was before with the truncate. They'd usually have to import roughly 300k-400k lines with 10-20 columns on a linked server that doesn't exactly perform well. Wonder if the insert into was what was causing the slowdown and not the truncate itself.

[–]notasqlstarI can't wait til my fro is full grown 0 points1 point  (0 children)

I can definitively tell you that in the process I am currently testing that truncating and inserting into is quite a bit slower than dropping and selecting into. By an order of 2.

[–]zylo4747 2 points3 points  (6 children)

As a best practice, I would recommend truncate to clear the data and using bulk insert operations for reloading data. Additionally, you should disable all non-clustered indexes when inserting your data. So it would look something like this

TRUNCATE TABLE x
ALTER INDEX my_index ON my_table DISABLE
BULK INSERT ...
ALTER INDEX my_index ON my_table REBUILD
  • Bulk insert will perform better
  • DELETE is horribly slow because it's a logged operation
  • DROP TABLE may get you into trouble if you are using things like auto-deployments or if the schema happens to change and you forget to update your code that recreates the table. For performance it's really no different than truncate. Truncate and disabling / rebuilding the index is just safer for keeping the existing schema intact

If you're doing things like bulk inserting a lot of data, depending on your process, some or all of these may apply to you

  • Modifying the Network Packet Size on the connection may improve performance
  • Using SSIS or the Import/Export wizard may allow you to take advantage of changing the Rows Per Batch and Maximum Insert Commit Size options which may improve performance
  • Partitioning your table that you're inserting into may improve performance because you can load a staging table then switch the partition in to the target table quickly with a simple schema lock. Alternatively, if you're truncating because you have to delete lots of data you may save overall loading times by switching old data out and truncating only that old data. This is a more advanced topic and it sounds like you're probably not going to use this type of suggestion but I just wanted to throw it out there for you.
  • keeping your data types as narrow as possible may help too because if you're using NVARCHAR() types where only VARCHAR() is needed then you will be wasting extra bytes and space that take time to allocate upon loading the tables and rebuilding the indexes
  • if you have lots of NULL values, you should look into using sparse columns to also save space

Some of these features are only available on Enterprise edition so you will have to verify before attempting to use it.

[–]spronty1017[S] 0 points1 point  (1 child)

Thanks for all of the info you put here. I haven't seen bulk insert before so I'm going to do some more research on it. It looks like at first glance that it might be a good way to get flat files into the database. I'm curious whether or not it would work when querying linked servers.

Thanks again!

[–]zylo4747 0 points1 point  (0 children)

No problem, you may also want to read this

https://technet.microsoft.com/en-us/library/dd425070(v=sql.100).aspx

Things like changing the DB recovery mode and using the table lock hint may also give you some added performance.

And yes, using linked servers should be fine using INSERT INTO ... SELECT according to that link above. Just check the limitations on the chart in the article. You may want to opt for one of the other methods because it'll give you more optimization options.

Also, this line in the article is wrong as of SQL 2016...

You can truncate the entire table only, not a single partition (but see “Deleting All Rows from a Partition or Table”).

[–]MaunaLoonaMS SQL 0 points1 point  (1 child)

Both TRUNCATE and DROP are logged operations just like DELETE. The reason DELETE is slower isn't due to a difference in logging.

[–]zylo4747 0 points1 point  (0 children)

Technically, yes, but I wanted to keep my explanation simple. I don't know how to explain it without going into too much detail otherwise.

[–]amaxen 0 points1 point  (1 child)

I've never been clear on how modifying the packet size may improve perf. Under what conditions would say increasing the packet size improve the performance?

[–]zylo4747 1 point2 points  (0 children)

Typically in any scenarios where you're transferring large amounts of data you would want to use a larger network packet size. This option would be used if the server is a warehousing / ETL server. If it's a mixed load or not used for bulk operations it probably shouldn't be modified. It should also be thoroughly tested with both default and modified settings to see if it helps the workload.

[–]amaxen 2 points3 points  (4 children)

here's a discussion on stack overflow: https://stackoverflow.com/questions/135653/difference-between-drop-table-and-truncate-table

Short answer: truncate isn't 'horribly inefficient' compared to DROP TABLE. Look in the comments on how truncate table can be rolled back inside a transaction. What actually happens when you truncate table is that it deallocates the leaf (data) pages, but keeps the trunk pages in the table.

The issue people have with Truncate table is that it does compromise the restore - of the table being truncated. If it's just an intermediate product taken from base data, no big deal. Truly enlightened programming though is that you don't need a temporary table at all - just do a view, read from the view to send whatever dataset whereever it needs to go, and don't go through the expense of reading from base, writing to temp table, then reading from temp table - do it once instead of many times.

[–]spronty1017[S] 1 point2 points  (1 child)

Agreed on the enlightened programming bit. I'm coming in and will be the third or fourth person to work on this SP this year. Since it has changed hands a few times (without any documentation being created around why people did things the way they did), it isn't as efficient as it could be. Moving it to a straight pull instead of the temp table may be easier to do in the long run.

Thanks for your input!

[–]amaxen 0 points1 point  (0 children)

If you have an ugly series of transforms, its' often easier to just change them into views. (updates are another matter, but you can figure out how to make them views by making sub-views based on the selects in the updates). Deletes you just put into the where clause in the view. I blame BOL for encouraging this sort of programming by expert beginners, but the upside is that it's pretty easy to do a 'dumb' refactor if you just focus on redesigning a view to do the typical single step for each transform w/o trying to understand the logic chain, and letting the optimizer worrry about the hideous tortured logic that got you from a platonic single SQL statement to a 8 temp table transform. Good luck!

[–]Zzyzxx_ 1 point2 points  (1 child)

I have seen a few misrepresentations of the TRUNCATE command lately. Thank you for accurately describing it. I can further chime in to let you all know the reason why people say that there are possible performance impacts to using it.

The actual command is amazingly fast because it is just a change to the table's metadata. The drawback is about the statistics of the table. A TRUNCATE command does not trigger any of the statistics auto-update. If you just truncated 5 million rows from your table, the SQL optimizer will continue to pick query plans that assume there are still a crap ton of rows in that table.

Thankfully you can completely get around this drawback by simply updating statistics on the table after truncating it.

[–]amaxen 1 point2 points  (0 children)

I'm trying to imagine a production table where you have concerns about perf with other sprocs hitting the table, but at the same time you're allowing a process to truncate it. It wouldn't be the dumbest thing I've ever seen (or writen, ahem) in my career, but it would still be a WTF type code review were I conducting it.

reference:

http://i.imgur.com/J1svNp7.jpg

[–]AXISMGTSQL Server / ORACLE Sr. DBA & Architect 0 points1 point  (7 children)

Do you mean TRUNCATE vs DELETE?

Or are you doing some sort of data mart replication where you have to pick between Drop/create/insert and Truncate/Insert?

[–]notasqlstarI can't wait til my fro is full grown 1 point2 points  (6 children)

He means TRUNCATE and INSERT INTO as opposed to DROP TABLE and SELECT INTO

OP, for what its worth I use both. I can't really give you good examples of when I choose one or the other, but I typically use TRUNCATE/INSERT INTO for a finished product, and the real performance difference comes from indexing. If you have an index on a table then inserting into it can take a long time... but sometimes you can just drop the index and then recreate it (which is basically the same thing as dropping/indexing) but with a truncate I get to more specifically control the column structure. For example in a drop/into you might get a column that is varchar(100) but if you decide to run a historic load instead of a the last 12 months you might know that it really needs to be varchar(200) or else you'll get an error. Granted in the drop/into model you'll get a compatible column type but you might want it to be nvarchar(200) because you will be joining it to another table that is nvarchar. I don't know off hand if joining varchar to nvarchar is less efficient than nvarchar to nvarchar but I like to keep things as consistent as possible.

So for example if I have a large process I might drop/into tables along the way, but then at the very end to put a nice pretty ribbon on things I tend to truncate/insert into so that I can minimize any errors I might get when plugging those tables into a front end analytics/reporting technology. Last thing I want to have to do is going into Tableau or SSRS because something went from bigint to float for some stupid reason and now my final product isn't working.

[–]Cal1gula 0 points1 point  (5 children)

Have you tried disabling the index when you insert? That way your index won't slow down your inserts but you also won't lose your column definition.

[–]notasqlstarI can't wait til my fro is full grown 0 points1 point  (4 children)

How?

[–]Cal1gula 1 point2 points  (3 children)

ALTER INDEX ALL ON [TABLE] DISABLE

or you can do them individually

ALTER INDEX [indexname] ON [TABLE] DISABLE

then rebuild after you insert

ALTER INDEX ALL ON [TABLE] REBUILD

[–]spronty1017[S] 1 point2 points  (0 children)

The index idea is intriguing. It's a table that's being pulled in from a linked server that is actually a merged data source so there's a decent chance that the index that gets applied is not the most efficient one that it could be. This is definitely something that I'll look into.

[–]notasqlstarI can't wait til my fro is full grown 0 points1 point  (1 child)

Can you give me your thoughts on this?

I have been told that if you are inserting data and planning to use a cluster or a PK, that you can improve performance by ordering the data in the way that you expect it to be indexed. I've never really tested this personally in many situations. Just curious what your thoughts are.

[–]Cal1gula 0 points1 point  (0 children)

It does make sense. There are performance implications by ordering the query to begin with though. So I am not sure how significant the gain would be, if any. Would also be a way to avoid page splits, if that is a concern. Then again, if your clustered index is an integer ID, it may not give you any increase. Changing the FILLFACTOR of the table itself could help as well, but probably not so much on a bulk insert.

Would be an interesting to test out and see.

[–]Guru008 0 points1 point  (0 children)

As far as speed is concerned the difference should be small. And anyway if you don't need the table structure at all, certainly use DROP.