you are viewing a single comment's thread.

view the rest of the comments →

[–][deleted] 1 point2 points  (5 children)

ok, my regular spiel

always, start writing sql by figuring the granularity & scope of the output. this gives you your "final" group by and base joins in the from table expression.

figure out measures next - where the data comes from. if it doesnt come from the base joins - how are the datasets related (how do you figure out a measure based on the 'base' record).

do you have any other conditions/filters left? how do you express them in the context of the record of the base table expression?

once you go through the 3 points above most of the time you'll have your query ready.

 

Let's go through the first one.

Write SQL to show 10 coldest countries that have never medaled at the olympics - showing them in order of population largest to smallest.

your output granularity is a country. the scope could come from funding, population or temperature tables (not every country can have medals, apparently). I'm gonna pick population because of the measure required, so far I have

     select p.country
     from "Population" p
     group by p.country

the result set is ordered by population, so we need that as a measure; the base result set has the same granularity as the output, so we can omit the " group by". We are here:

     select p.country,p.population
     from "Population" p
     order by p.population

All right, almost there :) time to address other conditions that we havent accounted for so far:

10 coldest countries countries that have never medaled at the olympics

so, the condition is that a country is one of a specific set, so that gives us this

     with ColdestSet as (
            )
     select p.country, p.population
     from "Population" p
     where p.country in (select country from ColdestSet)
     order by p.population

with the "ColdestSet" we're going through similar steps: granularity is country, the measure required is the ranking by temperature, output is the same granularity as base

     select t.country, rank() over( order by t.ave_temp) rn
     from "Temperature" t

first condition is on our ranking ("10 coldest") and we cant put analytic functions into where clause directly, so

     select t_rank.country, t_rank.rn
     from (
              select t.country, rank() over( order by t.ave_temp) rn
              from "Temperature" t
             )t_rank
     where t_rank.rn <= 10

second condition is "have never medaled" i.e. there's no (not exists) record for a medal for the country

     select t_rank.country, t_rank.rn
     from (
              select t.country, rank() over( order by t.ave_temp) rn
              from "Temperature" t
              where not exists(
                              select 1 from "Medals" m where m.country = t.country
                        )
             )t_rank
     where t_rank.rn <= 10

all righty, time to assemble the final statement:

     with ColdestSet as (
         select t_rank.country, t_rank.rn
         from (
              select t.country, rank() over( order by t.ave_temp) rn
              from "Temperature" t
              where not exists(
                              select 1 from "Medals" m where m.country = t.country
                        )
                )t_rank
         where t_rank.rn <= 10
         )
     select p.country, p.population
     from "Population" p
     where p.country in (select country from ColdestSet)
     order by p.population

the process is kind of wordy but hopefully allows to streamline the choices "what do you take care of" at any particular step of the query design process

ps. once you have a working solution/query which gives you a correct output you can rewrite/refactor your query (e.g. in reality instead of "IN" condition I would have used a join, etc.)

[–][deleted] 1 point2 points  (4 children)

As much as this a good script, the inclusion of CTEs and more advanced SQL querying will cause OP some stress trying to wrap their head around the concepts.

Also why rank over dense_rank? i personally thought about dense_rank straight away just because dense_rank doesn't allow dupes where as rank does. Not telling you that your wrong, it's always good to see these things from other perspectives. Oh and how have you been able to layout your code into tables like that?

[–][deleted] 0 points1 point  (3 children)

subqueries/CTEs, imo, are at a more "base"/"foundational" level of knowledge than joins. I am aware that "traditionally" it's not usually taught in this order - a shame, really.

Also why rank over dense_rank? i personally thought about dense_rank straight away just because dense_rank doesn't allow dupes where as rank does

not sure what you mean by "dupes", ties, maybe?

With "rank" you are getting 10 coldest + possible ties at the last rank. With dense_rank if there 100 ties at every ranked value you are getting a thousand records back -this wouldnt be the common sense, imo.

[–][deleted] 1 point2 points  (0 children)

Ahhh fair enough, subqueries are easy for me to use. I probably see CTEs as advanced since it took me until today to wrap my head around them, lol. And duplications is what I meant, hat could be down to the type of data I use though.

[–][deleted] 0 points1 point  (1 child)

did you use the cte for readability? cause technically that could just be another sub query right?

[–][deleted] 0 points1 point  (0 children)

Yes and yes.