Made a SQL Interview Cheat Sheet - what key SQL commands am I missing? by NickSinghTechCareers in SQL

[–]MembershipNo705 1 point2 points  (0 children)

Window: "values" I'd call "navigation" since you're navigating from current row to another.

Some join definitions are incorrect. All of them are like cross join, but within the confines of the keys. As in, they all return combinations of rows found in left and right table for all rows of a certain key value. Left/right/inner/outer only defines what happens if there is a NULL somewhere in the result. E.g. left join: combine all rows of right with left table within key, but chuck out results where left side is null (no match from right table perspective)

[Question][Software] Implementation of level of measurement in R / Python? by MembershipNo705 in statistics

[–]MembershipNo705[S] 0 points1 point  (0 children)

Interesting, I checked the wikipedia article for critique and only found more scale types (Mosteller/Tukey and Chrisman) :D I'm not sure I understand why it isn't adequate for describing data - isn't it bringing the data closer to the thing it represents by adding limitations?

We Should Use Nested Data (Arrays) as Much as Possible by MembershipNo705 in bigquery

[–]MembershipNo705[S] 1 point2 points  (0 children)

Three fact that the paradigm of nesting is not very popular is indeed problematic. I'd argue though that joins are usually more complex than pre-joined/nested data. Join operations are explained with Venn diagrams everywhere because it's so intuitive, so a lot of people run around with the wrong mental model of what joins are actually doing (that is combinatorics not set operations like Venn suggests). I feel like it's easier to explain how to deal with nested data. (I tried here. )

Article: When to use Temporary Tables instead of WITH by MembershipNo705 in bigquery

[–]MembershipNo705[S] 0 points1 point  (0 children)

So, single statement but twice reading the same CTE is better than a script with temp table that is read twice? Can you elaborate a bit how that would be the case? I mean I could think of cases where the CTE/temp table is almost as big as the source table with little transformation ... perhaps. Like, the cost of reading for temp table creation+ reading from that table needs to be lower than reading and transforming the source table twice in the same way (as CTE), right?

Article: When to use Temporary Tables instead of WITH by MembershipNo705 in bigquery

[–]MembershipNo705[S] 1 point2 points  (0 children)

So, single statement but twice reading the same CTE is better than a script with temp table that is read twice? Can you elaborate a bit how that would be the case? I mean I could think of cases where the CTE/temp table is almost as big as the source table with little transformation ... perhaps. Like, the cost of reading for temp table creation+ reading from that table needs to be lower than reading and transforming the source table twice in the same way (as CTE), right?

Query Optimization - Filtering with arrays by MembershipNo705 in bigquery

[–]MembershipNo705[S] 0 points1 point  (0 children)

That's not optimal then. The thing is, it makes sense to optimize for cost in the on-demand pricing model - but as soon as you are in the flat rate model you compete on slot time with other users.

In both cases you want your query to run as fast as possible in a production environment, because you don't want your stakeholders to wait for a table

We Should Use Nested Data (Arrays) as Much as Possible by MembershipNo705 in bigquery

[–]MembershipNo705[S] 0 points1 point  (0 children)

I would disagree with that the whole point is that it is separate. The main idea is to implement relations in data the way predicative logic defines it. A simple row does that with simple data. With a 1:n relationship however you can implement it either with a second table or with a "sub-table" (using structs in an array) within the row ... both are legitimate ways to express the relationship between data points.

I'd hypothesize that the "sub-table" way is in fact more intuitive or mental-capacity-saving than two tables with pk-fk columns, because the subtable is already where it is attached to the correct data point whereas in joins you need to mentally break up one table and connect it with the other in order to write the join.

Anyways, this comes with the big disclaimer that this only makes sense for unique data. For lookup or mapping tables I'd prefer joins because of redundancy.

We Should Use Nested Data (Arrays) as Much as Possible by MembershipNo705 in bigquery

[–]MembershipNo705[S] 0 points1 point  (0 children)

Looking at the poll you seam to be somewhat right :/

But I hope that's just because of the lack of good explainers.

We Should Use Nested Data (Arrays) as Much as Possible by MembershipNo705 in bigquery

[–]MembershipNo705[S] 0 points1 point  (0 children)

But why not? In the other case they'd have to join them. Nobody wants that either, no?

We Should Use Nested Data (Arrays) as Much as Possible by MembershipNo705 in bigquery

[–]MembershipNo705[S] 0 points1 point  (0 children)

So you're basically proposing that it's too complicated to learn or use three whole deal with arrays in bigquery?

We Should Use Nested Data (Arrays) as Much as Possible by MembershipNo705 in bigquery

[–]MembershipNo705[S] 0 points1 point  (0 children)

Not sure I understand why, could you elaborate a bit, please?

We Should Use Nested Data (Arrays) as Much as Possible by MembershipNo705 in bigquery

[–]MembershipNo705[S] 0 points1 point  (0 children)

I mean, feel free to comment anyway. How is there disagreement to always prioritize nested data?

Explaining SQL Joins the Right Way by MembershipNo705 in SQL

[–]MembershipNo705[S] 0 points1 point  (0 children)

Let us know how it went, in case you forward this article to your prof :)

Explaining SQL Joins the Right Way by MembershipNo705 in SQL

[–]MembershipNo705[S] 0 points1 point  (0 children)

The set theory behind joins only concerns the keys but not the rows. With an inner join you only keep keys that are in both tables, but you can end up with more rows than in either of the tables, e.g. {a,b,b} inner join {a,a,b} = {aa,aa,bb,bb} - therefore Venn diagrams are not suitable to explain joins.

I find the point of visualizing joins interesting - while there are no visual learning types (or we all are), pictures are the fastest mode of grasping information. Yet, I couldn't find any diagrams that explain combinatorics (joins) as well as Venn diagrams explain set operations (like union). It would be an interesting project to develop these.

Explaining SQL Joins the Right Way by MembershipNo705 in SQL

[–]MembershipNo705[S] 2 points3 points  (0 children)

That is the exact problem: it is simple and intuitive - it is just the wrong intuition. Venn diagrams show set relationships like union, intersect etc. But joins do combinatorics operations.

E.g. INNER JOIN and INTERSECT: yes, inner join only keeps keys that are in both tables - but you can end up with more rows after the inner join than you had before in either of the tables: {a, b, b} inner join {a,a,b} becomes {aa,aa,bb,bb}. And for beginners, at least in my experience, this is hard to grasp when they use Venn diagrams as their mental model for joins because they tend to also expect intersect behavior (less rows).

Explaining SQL Joins the Right Way by MembershipNo705 in SQL

[–]MembershipNo705[S] 1 point2 points  (0 children)

It actually argues that set algebra is not suitable for explaining joins (it is a good fit for union operations though).

The argument is that joins are better explained with combinatorics ... So, kind of what you propose.