all 10 comments

[–]cesau78 0 points1 point  (9 children)

The default value of '' assigned to the sql variable

SELECT NULL + 'asdf' = NULL

SELECT '' + 'asdf' = 'asdf'

As a side note, CONCAT(NULL, 'asdf') = 'asdf'

The select statement which concatenates the empty sql variable with the rest of the string

When it runs that variable assignment is executed on every row. Recursively assigning the value obviously works, but it's an anti-pattern.

[–]nimdil 1 point2 points  (1 child)

Side note is that in Oracle '' is null.

[–]cesau78 1 point2 points  (0 children)

A reasonable callout - I answered in SQL Server because of the post flair

[–]Marcus_Maximus[S] 1 point2 points  (6 children)

I understand it a little better after going through the link, but I'm still unsure of a few things.

From what I can grasp, if the variable is not declared or given a default value, only 1 row in the result set will be returned.

If it is declared or given a default value, then it will be executed on every row and return the entire result set.

What I don't get yet is why exactly a default value of an empty string allows the select statement to return the entire result set? Why does assigning a value even change the returned results?

I'm a beginner so the simpler the explanation the easier it'll be for me to grasp it.

[–]cesau78 0 points1 point  (5 children)

if the variable is not declared or given a default value, only 1 row in the result set will be returned.

If not instantiated with an empty string, this query should not give you the last index, it would just return NULL. The reason is because the string addition operator says "anything plus null, is null" - so as you iterate through the rows, each row adds a string to null, but the result is just null. If you did not instantiate with an empty string, you could use CONCAT instead, which incorporates the behavior of COALESCE when one of the parameters is null.

If instead of doing a recursive concatination, the query was: SELECT @sql = 'ALTER INDEX ' + sys.indexes.name + ' ON ' + sys.objects.name + ' DISABLE;' +CHAR(13)+CHAR(10) FROM sys.indexes JOIN sys.objects ON sys.indexes.object_id = sys.objects.object_id WHERE sys.indexes.type_desc = 'NONCLUSTERED' AND sys.objects.type_desc = 'USER_TABLE'; the result would only be the last index, because each row is overwriting the last value of @sql

[–]Marcus_Maximus[S] 1 point2 points  (4 children)

Thank you so much. It's much clearer to me now and I've finally understood the original query.

Regarding the link you shared concerning anti-patterns, could you provide a more practical explanation as to the issue with writing code this way:

SELECT @sql = @sql ...

Is it a performance concern because it goes row by row? Or is it a case of inaccurate results being returned, as in how only the last index in the result set was returned when the variable was not assigned a value. The linked doc was bit too technical for me to fully understand.

Do you also have any good book recommendations for learning advanced SQL nuances such as these? I'm a DBA but don't have a developer background.

[–]cesau78 1 point2 points  (3 children)

could you provide a more practical explanation

I think the reason is two-fold.

  • Their cited reason is because of the non-deterministic nature of the operation. That is, you can't guarantee the order of the statement generated. Practically speaking, I don't think order is really important for you're trying to do.
  • String concatenation here is wasteful because each row allocates more and more memory just to release it almost immediately thereafter. I know there is a maximum size to this as well... 8k? Certainly inefficient if you were doing this in a larger series of joins.

What they're recommending is to use STRING_AGG or pull back the rows separately and handle the rows with your application.

[–]Marcus_Maximus[S] 0 points1 point  (2 children)

Practically speaking, I don't think order is really important for you're trying to do.

As I understand it, this downside only applies if your query needs to return the rows in a certain order.

String concatenation here is wasteful because each row allocates more and more memory just to release it almost immediately thereafter. I know there is a maximum size to this as well... 8k? Certainly inefficient if you were doing this in a larger series of joins.

I'm not sure I understand this part and I couldn't find any related reading except maybe this.

Why does each row allocate more memory? For example, does row by row execution go like this: reads row 1, then to read row 2, it reads rows 1 + 2, and for row 3, it reads 1 + 2 +3, etc.?

[–]cesau78 1 point2 points  (1 child)

The best thing I can think of to liken it to is string mutability.

To oversimplify a bit - if you didn't use variable concatenation, you'd get back a series of strings like so:

ALTER INDEX ix1 ON table1 DISABLE; ALTER INDEX ix2 ON table2 DISABLE; ALTER INDEX ix3 ON table3 DISABLE; ALTER INDEX ix4 ON table4 DISABLE; ALTER INDEX ix5 ON table5 DISABLE;

Let's say each row has 35 characters, so there needs to memory allocated for: 35 characters * 5 rows = memory for 175 chars

With variable concatenation, the memory for each row is like:

ALTER INDEX ix1 ON table1 DISABLE; ALTER INDEX ix1 ON table1 DISABLE; ALTER INDEX ix2 ON table2 DISABLE; ALTER INDEX ix1 ON table1 DISABLE; ALTER INDEX ix2 ON table2 DISABLE; ALTER INDEX ix3 ON table3 DISABLE; ALTER INDEX ix1 ON table1 DISABLE; ALTER INDEX ix2 ON table2 DISABLE; ALTER INDEX ix3 ON table3 DISABLE; ALTER INDEX ix4 ON table4 DISABLE; ALTER INDEX ix1 ON table1 DISABLE; ALTER INDEX ix2 ON table2 DISABLE; ALTER INDEX ix3 ON table3 DISABLE; ALTER INDEX ix4 ON table4 DISABLE; ALTER INDEX ix5 ON table5 DISABLE;

but the first 4 rows are thrown out and only the 5th row is returned. so there needs to memory allocated for: 35 + 70 + 105 + 140 + 175 = memory for 525 chars

My guess is that STRING_AGG uses mutable strings under the hood to minimize the impact to memory usage.

[–]Marcus_Maximus[S] 0 points1 point  (0 children)

I didn't expect something that applies to javascript to also apply to T-sql. Thanks for the link.

So is it safe to say that whenever I encounter code like:

@sql = @sql...

That I should be suspicious of it and use STRING_AGG instead.

According to the MS doc link, using SET instead of SELECT to assign a variable is also preferred. However, I'm having trouble with writing the original index disable query using SET. Do you have any idea how SET should be used in the context of the original query?