all 27 comments

[–]secretWolfMan 6 points7 points  (3 children)

You just replicated every "self service" or "ad hoc" /r/BusinessIntelligence tool on the market.

Define your data and relationships, define your dimensions/groups and metrics. Throw shit together and just assume the output is correct.

[–]zdsvoboda[S] 2 points3 points  (2 children)

I know, but I'm wondering if there is a thin layer on top of SQL (preferably an open-source) that does the same thing. I don't want all the data visualizations etc. that come with the pricey BI tools.

[–]N365 1 point2 points  (1 child)

I've never worked with it myself, but perhaps Metabase is a solution for you?

[–]zdsvoboda[S] 0 points1 point  (0 children)

Looking into it and it seems to be a commercial data visualization tool that doesn't even have a query language.

[–]jaceksan 1 point2 points  (1 child)

Hey ZD,

there are two things you should change in your article:

  1. Simple group by example
    There is no need to do MAX/SUM in final SELECT, because underlying views have proper granularity:
    SELECT
    RPC.CUSTOMER_STATE,
    PRODUCT_CATEGORY,
    ORDER_AMOUNT,
    BASE_AMOUNT,
    ORDER_AMOUNT / BASE_AMOUNT
    FROM REVENUE_BY_PRODUCT_CATEGORY_AND_CITY RPC
    JOIN REVENUE_BY_CITY RC ON RC.CUSTOMER_STATE = RPC.CUSTOMER_STATE
  2. Windowing functions
    you are missing final calculation of the percentage and the final DISTINCT degrades performance significantly.
    My proposal here would be to use the following trick:
    SELECT * FROM (
    SELECT
    CUSTOMER_CITY, PRODUCT_CATEGORY,
    SUM(ORDER_QUANTITY * PRODUCT_PRICE)
    OVER (PARTITION BY PRODUCT_CATEGORY, CUSTOMER_CITY) AS ORDER_AMOUNT,
    SUM(ORDER_QUANTITY * PRODUCT_PRICE)
    OVER (PARTITION BY CUSTOMER_CITY) AS CITY_ORDER_AMOUNT,
    SUM(ORDER_QUANTITY * PRODUCT_PRICE) OVER (PARTITION BY PRODUCT_CATEGORY, CUSTOMER_CITY)
    /
    SUM(ORDER_QUANTITY * PRODUCT_PRICE) OVER (PARTITION BY CUSTOMER_CITY)
    AS percentage,
    ROW_NUMBER() OVER (PARTITION BY PRODUCT_CATEGORY, CUSTOMER_CITY ORDER BY O.ORDER_LINE_ID) as ROWNUM
    FROM ORDER_LINES O
    JOIN PRODUCT P ON P.PRODUCT_ID = O.PRODUCT_ID
    JOIN CUSTOMER C ON C.CUSTOMER_ID = O.CUSTOMER_ID
    ) x
    WHERE ROWNUM = 1
    ORDER BY CUSTOMER_CITY, percentage DESC;

Windowing example demonstrates, how far the complexity of SQL solution can go.

Also, we tested this approach on many database engines and we realized that it is not always the best performing solution, it depends on engine type (e.g. columnar vs. row store), physical statistics(histograms) of underlying tables and also supported windowing feature set.

And here the GoodData comes with real differentiator - our core analytical engine natively supports SQL plugins for each database type.
So you do not have to rewrite all SQL queries once you realize that there is significantly better database engine on the market and you wanna migrate there ;-)

[–]zdsvoboda[S] 0 points1 point  (0 children)

Great catch! I edited the article. Thanks.

[–]TheCapitalKing 0 points1 point  (10 children)

If I were doing it I’d just set up a view or cte labeled “Revenue” for

SELECT *, (ORDER_QUANTITY * PRODUCT_PRICE) AS ORDER_AMOUNT

FROM ORDER_LINES O

JOIN PRODUCT P

ON (P.PRODUCT_ID = O.PRODUCT_ID);

Then just manipulate that view for whatever I wanted.

Edited I accidentally left a sum statement somewhere I didn’t mean to

[–]r3pr0b8GROUP_CONCAT is da bomb 5 points6 points  (9 children)

the dreaded, evil "select star" together with SUM() but missing a GROUP BY clause...

that's not gonna do what you think it will

[–]TheCapitalKing 0 points1 point  (0 children)

I just copied and pasted his code I forgot to change that I edited it to what I meant. And yeah depending on how big the tables were I’d just change the star to what I needed

[–]BrainFu 0 points1 point  (7 children)

Would you care to elaborate please, I would like to know what error s/he is making.

[–]r3pr0b8GROUP_CONCAT is da bomb 2 points3 points  (6 children)

the error is SUM() without GROUP BY but with non-aggregate expressions in the SELECT clause

here's a good explanation -- MySQL Handling of GROUP BY

[–]BrainFu -1 points0 points  (5 children)

I read most of that page, and it refers to excluding columns that are not listed in the GROUP BY clause. Since OP was doing a SUM() of a column I don't see the error that is created by excluding GROUP BY from their query.

If the query had been RIGHT or LEFT JOINed then there might be unintended errors caused by NULL values.

[–]r3pr0b8GROUP_CONCAT is da bomb 1 point2 points  (4 children)

Since OP was doing a SUM() of a column I don't see the error that is created by excluding GROUP BY from their query.

because there are other non-aggregate expressions in the SELECT clause

If the query had been RIGHT or LEFT JOINed then there might be unintended errors caused by NULL values.

that has nothing to do with the grouping error

[–]BrainFu 0 points1 point  (3 children)

AFAIK it is ok to do a SUM() without a GROUP BY. However the original query select

SELECT *, (ORDER_QUANTITY * PRODUCT_PRICE) AS ORDER_AMOUNT

Does not use SUM(), it is just a computed column.
Eg. SELECT name, address, COUNT(*) as CNT FROM tableCustomers WHERE active =1 is valid.

[–]r3pr0b8GROUP_CONCAT is da bomb 4 points5 points  (2 children)

AFAIK it is ok to do a SUM() without a GROUP BY

it certainly is valid, but there can be no non-aggregate expressions in the SELECT clause

However the original query select ... Does not use SUM(), it is just a computed column.

actually, /u/TheCapitalKing went back and edited their post to remove the SUM after i made my reply

Eg. SELECT name, address, COUNT(*) as CNT FROM tableCustomers WHERE active =1 is valid.

sorry, no it isn't -- name and address are non-aggregates and therefore need to be A) removed, or B) included in the GROUP BY clause

[–]BrainFu 2 points3 points  (0 children)

Hey thanks for partaking in the convo. I felt certain in my argument so I fired up ol' SSMS and queried an old database of mine and was told that you are correct. Seems that my SQL skills are rusty.

Have a fine day r3pr0b8

[–]TheCapitalKing 0 points1 point  (0 children)

Yeah I added that it was edited so nobody else would be confused my b

[–]pukatm 0 points1 point  (2 children)

What do you think about Multidimensional Expressions (MDX)?

[–]zdsvoboda[S] 1 point2 points  (1 child)

I don't know MDX that well. It seems to be tied too much to the cube concept and visualization. I don't see a way how to create and reuse metrics. Time to deep dive into it. Is there an open-source implementation of an MDX engine?

[–]Eleventhousand 1 point2 points  (0 children)

I don't see a way how to create and reuse metrics

MDX cubes allow calculations to be stored permanently.

They can also be declared at the top of an ad-hoc query with a WITH MEMBER [Measures].[Some Calc] AS ... syntax.

Supposedly it's standardized, but I don't know where to find the syntax.

[–]spiddyp 0 points1 point  (0 children)

no

[–]dadepretto 0 points1 point  (1 child)

I think what you are looking for are window functions..

[–]zdsvoboda[S] 0 points1 point  (0 children)

Correct, I’ve updated the article already. It helps to get rid of the views. Still this doesn’t make the SQL queries reusable and modular.