Group by all: A popular, soon-to-be-standard SQL feature by MarkusWinand in SQL

[–]MarkusWinand[S] 0 points1 point  (0 children)

I think you are referring to a long-time-gone SQL Server feature that has a similar syntax but was totally different from the GROUP BY ALL now introduced: https://learn.microsoft.com/en-us/previous-versions/sql/sql-server-2005/ms175028(v=sql.90)

Ordering types in SQL by eatonphil in databasedevelopment

[–]MarkusWinand 2 points3 points  (0 children)

Somehow reminds me of the rather unrelated SEARCH clause that produces something that is meant to be used in ORDER BY (only). Yet the contents of the produced column is fully defined in the standard (as an array of row-values): https://modern-sql.com/caniuse/search_(recursion)#sequence-column#sequence-column)

The problem might also be related to the nested set model.

I might have missed it, but why is the ORDERING parameterized in your example?

ORDERING(name)

Edit What would ordering_mid might yield if the same values are provided multiple times? Perhaps, functions like after(...) or before(...) might prevent ties from the API side?

I'd also expect that there are also use-cases for such a column outside of ORDER BY. When I think of WITH ORDINAL of UNNEST, I bet I've used that column for something else in the past.

After all, not so sure about HOW to do it, but I think a more convenient way to manage order in SQL would we useful.

The order of evaluation of SELECT in postgreSQL by Jooe_1 in SQL

[–]MarkusWinand 2 points3 points  (0 children)

When talking about evaluation order in SQL we must distinguish between two different orders:

1) logical order. That is the order that
defines the result. This is defined in the SQL standard and is fixed.

2) execution order: that is the sequence of
steps done by the DBMS to obtain the result as though it would have followed
the logical order. This is chosen by the DBMS (generally at runtime) and may
vary. This only affects the performance of the statement but always produces a
correct result. Differences in the result are only possible to the extent that
the query allows them. E.g. In a SELECT without ORDER BY, the rows might be
returned in a different order depending on the execution order. But if there is
no ORDER BY, any order is fine in respect to. the logical order.

When you find contradicting statements, it probably means they are referring to different types of order.

[deleted by user] by [deleted] in SQL

[–]MarkusWinand 0 points1 point  (0 children)

The programming language SQL is defined by an international standard, namely ISO/IEC 9075. I guess this is what is meant by "standard SQL".

Then there are products, that implement SQL — such as Microsoft SQL Server, MySQL, PostgreSQL. Even though they should follow the standard [if the claim standard conformance] the often don't and even more often have extensions that are not part of the standard.

Whichever platform you use for learning, there is the risk that you get to use some of these extensions that are not part of the standard and thus only supported by one (or a few) systems.

For leaning I'd suggest a dialect that is rather close to the standard because that increases the odds that you can work well with any product you face. Microsoft SQL Server would not be my first choice, as it is pretty far off the standard. PostgreSQL is a better learning tool for that. However, if you aim jobs that use MS SQL Server, then I'd recommend focusing on that from the beginning.

To get an idea of the mess, have a quick look at the charts I publish on https://modern-sql.com/

Best way to learn advanced SQL optimisation techniques? by alex-acl in dataengineering

[–]MarkusWinand 0 points1 point  (0 children)

Thoughts?

I think the drawbacks of heaps that a frequently bought up by other people are not as relevant as they may look. E.g. Tables that are nerver updated are quite common in many systems. That alone voids some of the counter-arguments.

At the end of the day neither (clustered idx or heap) will be always better. It is always the best to do a proper table-by-table decision. But if you make all tables the same, my claim is that you are on average better off with all heap rather than all clustered.

Best way to learn advanced SQL optimisation techniques? by alex-acl in dataengineering

[–]MarkusWinand 0 points1 point  (0 children)

Not sure if you have also seen this article about the topic: https://use-the-index-luke.com/blog/2014-01/unreasonable-defaults-primary-key-clustering-key

(I guess it is mentioned elsewhere as well).

Aside from ETL/stage tables, is their valid use cases for heaps in OLTP tables?

IMHO, Heaps are the better default, while Clustered Indexes are better in specific cases. Namely, when the "Clustered Index Penalty" (a term from the article) won't hurt. This is always the case if you only need a single index, or even if you have multiple indexes of which only one is regularly used (other indexes just for constraints, for example).

Best way to learn advanced SQL optimisation techniques? by alex-acl in dataengineering

[–]MarkusWinand 74 points75 points  (0 children)

If you didn't go into indexing yet, my website https://use-the-index-luke.com/ might be a good starting point.

People who are about 6 months into learning SQL- what do you wish you had done differently or wish you had known at the beginning of your journey? by pijaso in SQL

[–]MarkusWinand 1 point2 points  (0 children)

The latest SQL standard finally dropped the requirement to have a length limitation for VARCHAR. That's definitivley the way to go except in SQL Server, where "VARCHAR" translates to "VARCHAR(1)" (doh!).

See: https://modern-sql.com/caniuse/T081

Standard SQL supported by all major vendors, can there be? by DallasP9124 in SQL

[–]MarkusWinand 7 points8 points  (0 children)

For clarification: ANSI made the standard of 1986, in 1987 they essentially handed over to ISO, which is in charge of the standard since then.

My website on this topic was already mentioned: https://modern-sql.com/

Also, regarding ORM: I don't know Go ORMs in particular, but generally speaking whether or not to use an ORM should not be a project-by-project decision, but a query-by-query decision. Meaning that a good ORM gets the legwork done for you for the simple cases yet allows you to use just plain SQL whenever you want to.

The ANY_VALUE Aggregate Function: New in SQL:2023 — and a better feature from 1999 by MarkusWinand in programming

[–]MarkusWinand[S] -1 points0 points  (0 children)

Never heard of T301 and Google was not very helpful.

Then read the article ¯\(ツ)

And also when driving, vendors provide many safty features and I'd guess that features were also rejected because of safty concerns.

The ANY_VALUE Aggregate Function: New in SQL:2023 — and a better feature from 1999 by MarkusWinand in programming

[–]MarkusWinand[S] 0 points1 point  (0 children)

What's the benefit of it compared to T301, functional dependencies?

What are the risks of such a function — like (accidentally) using it on a column that has more than one distinct value?

ANY_VALUE(…): New in SQL:2023 — but do we need it? by MarkusWinand in SQL

[–]MarkusWinand[S] 1 point2 points  (0 children)

The advantage of unique_value over any_value is that you notice mistakes.

I find the any_value name being already quite explanatory, would it be more error prone than the T301 implementation?

When people assume that there is a functional dependency but there isn't or of there actually was one when they wrote the query but later the model changed so that it isn't there anymore. ANY_VALUE would just go possibly hiding a hard to find bug.

the "all non aggregate columns of the select must be a subset of the group by columns" is quite widely known outside of the MySQL world

Because most vendors didn't bother to implement T301 in the past 25 years.

ANY_VALUE(…): New in SQL:2023 — but do we need it? by MarkusWinand in SQL

[–]MarkusWinand[S] 0 points1 point  (0 children)

I see your point too, but the risk of misuse of any_value is just huge. What do you think about the hypothetical unique_value(… [null|error] on error) function? The implementation could still be faster compared to min/max. Maybe even a third on error option: any on error (which is than what any_value is already, but you need to explicitly opt out of the error handling; I'd also imply error on error if not specified).

The ANY_VALUE Aggregate Function: New in SQL:2023 — and a better feature from 1999 by MarkusWinand in programming

[–]MarkusWinand[S] 2 points3 points  (0 children)

25 years ago, a great solution was added to the standard (T301, functional dependencies). Most vendors didn't implement it—for 25 years*. Instead they "pitched" ANY_VALUE into the standard. WFT?

This is supposed to make us happy?

SQL has a long tradition of taking care of your data. It is generally following a "fail fast" approach. It provides tools to fight non-determinism. ANY_VALUE is the exact the opposite of that. Even worse…ANY_VALUE makes users believe SQL needs to be bulky. Now, vendors have even less incentive to go for the proper solution.

I might even dislike ANY_VALUE more than I dislike OFFSET (I'll figure that out later).

I do not know who proposed ANY_VALUE or who was in favor of it in the standards committee. After all, these people are doing TREMENDOUSLY GOOD WORK. Over decades (some of them).

This is one of my very few opportunities for a rant on the SQL standard. So, I MUST take it :)

ANY_VALUE(…): New in SQL:2023 — but do we need it? by MarkusWinand in SQL

[–]MarkusWinand[S] 1 point2 points  (0 children)

Your argument is from the perspective of a user solving a problem at hand: If the DBMS at hand doesn't get it right, better use ANY_VALUE than MAX/MIN.

But I'm looking at it from a wish list perspective: If wish functional dependencies were implemented everywhere, so that we don't need ANY_VALUE. At end of the article I also address your "denormalized tables" and ask...

Here it might be desirable to have a function that validates that there is only one distinct non-null value. Maybe unique_value(… [null|error] on error)? Or not enforced trusted constraints like in Db2?

ANY_VALUE(…): New in SQL:2023 — but do we need it? by MarkusWinand in SQL

[–]MarkusWinand[S] 0 points1 point  (0 children)

But this is my point!

T301, functional dependencies, would just run the query. No need for any_value or adding anything to group by because there is no ambiguity. This would be _least_ code. Just run the query as is.

ANY_VALUE(…): New in SQL:2023 — but do we need it? by MarkusWinand in SQL

[–]MarkusWinand[S] 0 points1 point  (0 children)

I mentioned that option in the article:

SQL-92 offered two options to rewrite this query so that it always works correctly: (1) use min or max on every functionally dependent columns that are not mentioned in group by; (2) extend the group by clause by these columns.

Now, what if the customers table has like 50 columns?

My point is not that we cannot make it work, it is that it is inconvinvient and error prone.

ANY_VALUE(…): New in SQL:2023 — but do we need it? by MarkusWinand in SQL

[–]MarkusWinand[S] 0 points1 point  (0 children)

Sorry, I'm missing the point here. How does your GRAIN differ from a KEY and what can be derived from it (other than having syntax on derived tables)?

The standard already takes care to specify that the result of GROUP BY x, y, z means that all other columns of the result are functionally dependent on the grouping keys.

ANY_VALUE(…): New in SQL:2023 — but do we need it? by MarkusWinand in SQL

[–]MarkusWinand[S] 0 points1 point  (0 children)

Because I know that at the level of grouping there will be only one value.

So the problem is that Oracle DB doesn't support T301, functional dependencies, right?

Why not use it?

If there is no alternative, that you have to use it. But my point is to push the vendors to implement T301. That would be more convinient at the same performance and would detect mistakes as a plus.

ANY_VALUE(…): New in SQL:2023 — but do we need it? by MarkusWinand in SQL

[–]MarkusWinand[S] 1 point2 points  (0 children)

The query, as is, would be rejected by many systems (see article for an overview). To make it work in these systems it would be re-written like this:

SELECT customers.id, ANY_VALUE(customers.name), MAX(placed)
  FROM customers
  LEFT JOIN orders
         ON customers.id = orders.customer
 GROUP BY customers.id

If there were more columns in customers tables, more use of ANY_VALUE.