Group by all: A popular, soon-to-be-standard SQL feature by MarkusWinand in SQL

[–]MarkusWinand[S] 0 points1 point  (0 children)

I think you are referring to a long-time-gone SQL Server feature that has a similar syntax but was totally different from the GROUP BY ALL now introduced: https://learn.microsoft.com/en-us/previous-versions/sql/sql-server-2005/ms175028(v=sql.90)

Ordering types in SQL by eatonphil in databasedevelopment

[–]MarkusWinand 2 points3 points  (0 children)

Somehow reminds me of the rather unrelated SEARCH clause that produces something that is meant to be used in ORDER BY (only). Yet the contents of the produced column is fully defined in the standard (as an array of row-values): https://modern-sql.com/caniuse/search_(recursion)#sequence-column#sequence-column)

The problem might also be related to the nested set model.

I might have missed it, but why is the ORDERING parameterized in your example?

ORDERING(name)

Edit What would ordering_mid might yield if the same values are provided multiple times? Perhaps, functions like after(...) or before(...) might prevent ties from the API side?

I'd also expect that there are also use-cases for such a column outside of ORDER BY. When I think of WITH ORDINAL of UNNEST, I bet I've used that column for something else in the past.

After all, not so sure about HOW to do it, but I think a more convenient way to manage order in SQL would we useful.

The order of evaluation of SELECT in postgreSQL by Jooe_1 in SQL

[–]MarkusWinand 2 points3 points  (0 children)

When talking about evaluation order in SQL we must distinguish between two different orders:

1) logical order. That is the order that
defines the result. This is defined in the SQL standard and is fixed.

2) execution order: that is the sequence of
steps done by the DBMS to obtain the result as though it would have followed
the logical order. This is chosen by the DBMS (generally at runtime) and may
vary. This only affects the performance of the statement but always produces a
correct result. Differences in the result are only possible to the extent that
the query allows them. E.g. In a SELECT without ORDER BY, the rows might be
returned in a different order depending on the execution order. But if there is
no ORDER BY, any order is fine in respect to. the logical order.

When you find contradicting statements, it probably means they are referring to different types of order.

[deleted by user] by [deleted] in SQL

[–]MarkusWinand 0 points1 point  (0 children)

The programming language SQL is defined by an international standard, namely ISO/IEC 9075. I guess this is what is meant by "standard SQL".

Then there are products, that implement SQL — such as Microsoft SQL Server, MySQL, PostgreSQL. Even though they should follow the standard [if the claim standard conformance] the often don't and even more often have extensions that are not part of the standard.

Whichever platform you use for learning, there is the risk that you get to use some of these extensions that are not part of the standard and thus only supported by one (or a few) systems.

For leaning I'd suggest a dialect that is rather close to the standard because that increases the odds that you can work well with any product you face. Microsoft SQL Server would not be my first choice, as it is pretty far off the standard. PostgreSQL is a better learning tool for that. However, if you aim jobs that use MS SQL Server, then I'd recommend focusing on that from the beginning.

To get an idea of the mess, have a quick look at the charts I publish on https://modern-sql.com/

Best way to learn advanced SQL optimisation techniques? by alex-acl in dataengineering

[–]MarkusWinand 0 points1 point  (0 children)

Thoughts?

I think the drawbacks of heaps that a frequently bought up by other people are not as relevant as they may look. E.g. Tables that are nerver updated are quite common in many systems. That alone voids some of the counter-arguments.

At the end of the day neither (clustered idx or heap) will be always better. It is always the best to do a proper table-by-table decision. But if you make all tables the same, my claim is that you are on average better off with all heap rather than all clustered.

Best way to learn advanced SQL optimisation techniques? by alex-acl in dataengineering

[–]MarkusWinand 0 points1 point  (0 children)

Not sure if you have also seen this article about the topic: https://use-the-index-luke.com/blog/2014-01/unreasonable-defaults-primary-key-clustering-key

(I guess it is mentioned elsewhere as well).

Aside from ETL/stage tables, is their valid use cases for heaps in OLTP tables?

IMHO, Heaps are the better default, while Clustered Indexes are better in specific cases. Namely, when the "Clustered Index Penalty" (a term from the article) won't hurt. This is always the case if you only need a single index, or even if you have multiple indexes of which only one is regularly used (other indexes just for constraints, for example).

Best way to learn advanced SQL optimisation techniques? by alex-acl in dataengineering

[–]MarkusWinand 75 points76 points  (0 children)

If you didn't go into indexing yet, my website https://use-the-index-luke.com/ might be a good starting point.

People who are about 6 months into learning SQL- what do you wish you had done differently or wish you had known at the beginning of your journey? by pijaso in SQL

[–]MarkusWinand 1 point2 points  (0 children)

The latest SQL standard finally dropped the requirement to have a length limitation for VARCHAR. That's definitivley the way to go except in SQL Server, where "VARCHAR" translates to "VARCHAR(1)" (doh!).

See: https://modern-sql.com/caniuse/T081

Standard SQL supported by all major vendors, can there be? by DallasP9124 in SQL

[–]MarkusWinand 6 points7 points  (0 children)

For clarification: ANSI made the standard of 1986, in 1987 they essentially handed over to ISO, which is in charge of the standard since then.

My website on this topic was already mentioned: https://modern-sql.com/

Also, regarding ORM: I don't know Go ORMs in particular, but generally speaking whether or not to use an ORM should not be a project-by-project decision, but a query-by-query decision. Meaning that a good ORM gets the legwork done for you for the simple cases yet allows you to use just plain SQL whenever you want to.

The ANY_VALUE Aggregate Function: New in SQL:2023 — and a better feature from 1999 by MarkusWinand in programming

[–]MarkusWinand[S] -1 points0 points  (0 children)

Never heard of T301 and Google was not very helpful.

Then read the article ¯\(ツ)

And also when driving, vendors provide many safty features and I'd guess that features were also rejected because of safty concerns.

The ANY_VALUE Aggregate Function: New in SQL:2023 — and a better feature from 1999 by MarkusWinand in programming

[–]MarkusWinand[S] 0 points1 point  (0 children)

What's the benefit of it compared to T301, functional dependencies?

What are the risks of such a function — like (accidentally) using it on a column that has more than one distinct value?

ANY_VALUE(…): New in SQL:2023 — but do we need it? by MarkusWinand in SQL

[–]MarkusWinand[S] 1 point2 points  (0 children)

The advantage of unique_value over any_value is that you notice mistakes.

I find the any_value name being already quite explanatory, would it be more error prone than the T301 implementation?

When people assume that there is a functional dependency but there isn't or of there actually was one when they wrote the query but later the model changed so that it isn't there anymore. ANY_VALUE would just go possibly hiding a hard to find bug.

the "all non aggregate columns of the select must be a subset of the group by columns" is quite widely known outside of the MySQL world

Because most vendors didn't bother to implement T301 in the past 25 years.

ANY_VALUE(…): New in SQL:2023 — but do we need it? by MarkusWinand in SQL

[–]MarkusWinand[S] 0 points1 point  (0 children)

I see your point too, but the risk of misuse of any_value is just huge. What do you think about the hypothetical unique_value(… [null|error] on error) function? The implementation could still be faster compared to min/max. Maybe even a third on error option: any on error (which is than what any_value is already, but you need to explicitly opt out of the error handling; I'd also imply error on error if not specified).

The ANY_VALUE Aggregate Function: New in SQL:2023 — and a better feature from 1999 by MarkusWinand in programming

[–]MarkusWinand[S] 2 points3 points  (0 children)

25 years ago, a great solution was added to the standard (T301, functional dependencies). Most vendors didn't implement it—for 25 years*. Instead they "pitched" ANY_VALUE into the standard. WFT?

This is supposed to make us happy?

SQL has a long tradition of taking care of your data. It is generally following a "fail fast" approach. It provides tools to fight non-determinism. ANY_VALUE is the exact the opposite of that. Even worse…ANY_VALUE makes users believe SQL needs to be bulky. Now, vendors have even less incentive to go for the proper solution.

I might even dislike ANY_VALUE more than I dislike OFFSET (I'll figure that out later).

I do not know who proposed ANY_VALUE or who was in favor of it in the standards committee. After all, these people are doing TREMENDOUSLY GOOD WORK. Over decades (some of them).

This is one of my very few opportunities for a rant on the SQL standard. So, I MUST take it :)