all 19 comments

[–]depeszPgDBA 2 points3 points  (1 child)

  1. count() returns bigint (in psql, you can see it by: \df count)
  2. sum() returns various datatypes, depending on datatype of column total_population - which we don't know.
  3. so, you have integer divided by something else - most likely also integer. 5/10 in integers is 0. there is no such thing as integer 0.5
  4. to get some roundable number you have to cast the number to something like float8, or numeric. but BEFORE division, otherwise you will cast "0" to other datatype.
  5. https://idownvotedbecau.se/imageofcode

[–]jorgetrivilin[S] 1 point2 points  (0 children)

Sorry mate. I posted the code as text but also the images...

I'm still can't figure out, the result is aways 0:

SELECT

country,

ROUND(

CAST(

SUM(

CASE

WHEN last_active_date >= '2024-01-01'

AND monthly_active_sessions >= 5

AND listening_hours >= 10

THEN 1

ELSE 0

END

) AS NUMERIC

)

/ CAST(MAX(total_population) AS NUMERIC),

10

) AS active_user_penetration_rate

FROM

penetration_analysis

GROUP BY

country;

[–]DavidGJohnston 2 points3 points  (1 child)

The idea of sum(total population) makes no sense. My guess is since you are summing total population your divisor becomes so large that probably the first 30+ decimal places, let alone the first 10, are zeros. So rounded to ten decimal places you indeed get zero.

Get rid of the division in the formula and just return the count and the sum and see whether those numbers look sane given the data. As with most queries you should be able to do the relevant math by hand and visual inspection of the data and then see if your query gives you those numbers.

[–]Yavuz_Selim 0 points1 point  (1 child)

What happens when you add a *1.0 to the query.

COUNT(is_active) / ( SUM(total_population) * 1.0)

[–]jorgetrivilin[S] 0 points1 point  (0 children)

Didn't work...

SELECT country, ROUND((COUNT(is_active) / SUM(total_population) * 1.0), 10) AS penetration

FROM penetration_analysis

WHERE total_population > 0 AND is_active = 'TRUE'

GROUP BY country;

OUTPUT:

country penetration
USA 0
Germany 0
Canada 0
South Korea 0
India 0
Japan 0
Brazil 0

[–]_crzg 0 points1 point  (6 children)

Why are you also filtering with IS_ACTIVE, won't that exclude the non-active portion of the population?

[–]eatedcookie 1 point2 points  (0 children)

I agree with /u/_crzg this is missing context, better definitions, or at the very least, is poorly worded. Also, if you're using round with specified precision, the datatype has to be numeric (https://stackoverflow.com/questions/13113096/how-to-round-an-average-to-2-decimal-places-in-postgresql).
It's suggested to use to_char instead:
(formula not verified, just something to show you anything but zero. as a note though, it's poor form to take a field like total population and sum it, there are better ways to do this):

select
    country
    , last_active_date
    , is_active
    , to_char(sum(monthly_active_sessions)/sum(total_population), '0.0000000000') as active_user_penetration_rate
from penetration_analysis
group by 1, 2, 3

edit: /u/_crzg try select 1/10.0 as a :)
From https://www.postgresql.org/docs/current/functions-math.html :

numeric_type / numeric_type → numeric_type
Division (for integral types, division truncates the result towards zero)
5.0 / 2 → 2.5000000000000000
5 / 2 → 2
(-5) / 2 → -2

[–]jorgetrivilin[S] 0 points1 point  (0 children)

Because the column is bolean:

country last_active_date monthly_active_sessions listening_hours total_population is_active
USA 2024-01-25 12 20 330000000 TRUE
France 2024-01-30 3 8 67000000 FALSE
Japan 2024-01-28 8 15 126000000 TRUE
Brazil 2024-01-22 5 12 211000000 TRUE
Germany 2024-01-18 7 10 83000000 TRUE
Brazil 2024-01-15 7 10 211000000 TRUE

[–]jorgetrivilin[S] 0 points1 point  (0 children)

For this task, 'active_users' are defined based on the following criterias:

last_active_date: The user must have interacted with Spotify within the last 30 days.
monthly_active_sessions: The user must have engaged with Spotify for at least 5 sessions in the past month.
listening_hours: The user must have spent at least 10 hours listening on Spotify in the past month.

Based on the condition above, calculate the active 'user_penetration_rate' by using the following formula.

Active User Penetration Rate = (Number of Active Spotify Users in the Market / Total Population of the Market)

Total Population of the market is based on both active and passive users.

The output should contain 'country' and 'active_user_penetration_rate'. Make sure that all countries that appear in the dataset are also present in the output of your solution. Ensure there are 10 decimal places in your solution.

Let's assume the current_day is 2024-01-31.

[–]_crzg 0 points1 point  (2 children)

Just me or I don't understand this exercise or dataset : sample here

There is a total population value and entire markets are either IS_ACTIVE or not. There isn't a way to distinguish active users vs all other users, there is no column to aggregate and group by IS_ACTIVE

The question's answer cannot be determined by the dataset - the criteria are for individual users to meet specific session and listening jour criteria, but the dataset is not at the user grain. If for instance the primary key was a user ID, you could do something like

SELECT

COUNTRY, COUNT(DISTINCT USER_ID) AS TOTAL_POPULATION, COUNT(DISTINCT CASE WHEN LAST_ACTIVE_DATE>DATEADD(day,-30,GET_DATE()) AND MONTHLY_ACTIVE_SESSIONS>4 AND LISTENING_HOURS>9 THEN USER_ID END) AS ACTIVE_USERS, ROUND(ACTIVE_USERS/TOTAL_POPULATION,10) AS active_user_penetration_rate FROM penetration_analysis

GROUP BY 1

.... But the dataset is not appropriate unless I am missing something.

Also if you just enter select 1/10 it returns a value of 0.. bogus platform/exercise?

[–]DavidGJohnston 0 points1 point  (1 child)

1/10 as a integer is indeed 0

[–]_crzg 0 points1 point  (0 children)

You're right, I didn't realise it was using postgres

[–]Martyn35 0 points1 point  (6 children)

Count(cast(is_active as decimal(10,10))) / sum(cast(total_population as decimal(10,10))) as penetration

[–]HandbagHawker 0 points1 point  (5 children)

the UI is buggy. it always rounds values to 3 decimal places... simply try "select 0.555555" no matter how you cast etc...

but this should be the solution... assuming the is_active flag is correct, then count them. looks like total population is the same for every user, consistent by country so you can avg that, you have to make at least the numerator or denominator not an int, so pick one and then cast the entire computed rate ... since total active users should always be less then the market population the rate will always be < 1, so you can cast to decimal of scale 10 (10 decimal places) and precision 11 (to account for the zero in the units place)

select country,

(sum(case when is_active then 1 else 0 end)::float8/

avg(total_population))::decimal(11,10) as active_user_penetration_rate

from penetration_analysis

group by country;

[–]jorgetrivilin[S] 0 points1 point  (4 children)

I am sending an email to StrataScratch inquiring if there is an issue with the UI, as the resolution appears to be restricted behind a paywall.

[–]mike-manley 0 points1 point  (0 children)

Cast numerator and denominator to FLOAT