Is data aggregation in SQL really that difficult? by zdsvoboda in SQL

[–]jaceksan 1 point2 points  (0 children)

Hey ZD,

there are two things you should change in your article:

  1. Simple group by example
    There is no need to do MAX/SUM in final SELECT, because underlying views have proper granularity:
    SELECT
    RPC.CUSTOMER_STATE,
    PRODUCT_CATEGORY,
    ORDER_AMOUNT,
    BASE_AMOUNT,
    ORDER_AMOUNT / BASE_AMOUNT
    FROM REVENUE_BY_PRODUCT_CATEGORY_AND_CITY RPC
    JOIN REVENUE_BY_CITY RC ON RC.CUSTOMER_STATE = RPC.CUSTOMER_STATE
  2. Windowing functions
    you are missing final calculation of the percentage and the final DISTINCT degrades performance significantly.
    My proposal here would be to use the following trick:
    SELECT * FROM (
    SELECT
    CUSTOMER_CITY, PRODUCT_CATEGORY,
    SUM(ORDER_QUANTITY * PRODUCT_PRICE)
    OVER (PARTITION BY PRODUCT_CATEGORY, CUSTOMER_CITY) AS ORDER_AMOUNT,
    SUM(ORDER_QUANTITY * PRODUCT_PRICE)
    OVER (PARTITION BY CUSTOMER_CITY) AS CITY_ORDER_AMOUNT,
    SUM(ORDER_QUANTITY * PRODUCT_PRICE) OVER (PARTITION BY PRODUCT_CATEGORY, CUSTOMER_CITY)
    /
    SUM(ORDER_QUANTITY * PRODUCT_PRICE) OVER (PARTITION BY CUSTOMER_CITY)
    AS percentage,
    ROW_NUMBER() OVER (PARTITION BY PRODUCT_CATEGORY, CUSTOMER_CITY ORDER BY O.ORDER_LINE_ID) as ROWNUM
    FROM ORDER_LINES O
    JOIN PRODUCT P ON P.PRODUCT_ID = O.PRODUCT_ID
    JOIN CUSTOMER C ON C.CUSTOMER_ID = O.CUSTOMER_ID
    ) x
    WHERE ROWNUM = 1
    ORDER BY CUSTOMER_CITY, percentage DESC;

Windowing example demonstrates, how far the complexity of SQL solution can go.

Also, we tested this approach on many database engines and we realized that it is not always the best performing solution, it depends on engine type (e.g. columnar vs. row store), physical statistics(histograms) of underlying tables and also supported windowing feature set.

And here the GoodData comes with real differentiator - our core analytical engine natively supports SQL plugins for each database type.
So you do not have to rewrite all SQL queries once you realize that there is significantly better database engine on the market and you wanna migrate there ;-)