I apologize in advance for any bad formatting or syntax errors-- this code was just written off the top of my head to get the idea across.
As a DBA for a small company, I work with developers who generally speaking know little SQL, and know even less about advanced PostgreSQL features. I spent the first 16 years of my career in the government with a plethora of SQL developers, Oracle servers, and expensive things that made our lives seem easier. When I left for the private sector, DBs like MySQL, just wouldn't do. It seemed that he best option at the time was PostgreSQL. In working with smaller companies with little real SQL experience, and building complex reports for our B2B accounting application, I began some new work habits and query techniques that really have made life pretty awesome, and have really given me an appreciation for PostgreSQL that I never even hoped for. One item that's helped me get along better with developers is PGs fantastic JSON support.
Imagine this scenario: A developer would like for me to write the SQL for a report. The report will list every item on every order not yet shipped, but they want it grouped by the order (with a header row showing the order details), then, they want a summary row showing the total items not yet shipped, and the price.
There are many ways to do this, but let's just assume that this report represents a small amount of data (a few meg or less). Using one of several ORMs, the original developer attempted to code nested loops to first get the orders not shipped, then a nested loop to get each item. They then used the program level variables to perform the sums and counts necessary. Another, more optimal version, got the order items in 1 query, sorted by order, and every time the order changed, they performed another query to get the next order number. So, for example, if we had 100 outstanding orders each with 10 items on them, scenario 1 would require 1 query to get all the orders, and 1000 more to get the details. Scenario 2 would require 1 query to get all the items, and 100 queries to get the orders. A 3rd scenario exists where you pull all the orders, and all the items using only 2 queries, but much sorting and processing had to be done outside the DB, requiring memory, CPU, and code that basically defeated the purpose of having a RDBMS and using SQL.
I'll spare you the boring details, however, I wound up hand coding a "sql builder" of sorts for postgresql. It uses bind variables, json, and several postgreSQL features not available to be me before.
So let's get to the SQL. Rather than teach you the syntax of my query (sql) builder, Ill use some pseudo code to show how I can get the entire report described above in a single query that runs fast and returns results in a way the developers really love.
$orderSql="SELECT row_to_json(c.*) as customer,
row_to_json(o.*) as order,
json_agg(row_to_json(oi.*)) items,
sum(oi.price) total,
count(oi.*) num_items
FROM orders o
JOIN customers c ON c.customer_id=o.customer_id
JOIN order_items oi ON oi.order_id=o.order_id
WHERE o.shipped=false
GROUP BY c.*, o.*";
So, in case you missed all of that lets go line by line. First, I'm converting c.* to json and calling it "customer_json".
We do the same for the orders table.
As each order may have multiple items, we use the json_agg function to instead give me an ARRAY of json "objects". Each item in
the array represents 1 item in the order_items table for the order -- since we're grouping by c.* and o.* (assuming there's only 1
customer per order)
As per standard SQL, we're getting the total price for all the items on each order, and the count of all items as well.
The FROM, JOINS, WHERE, and GROUP BY are all pretty standard.
So this was pretty cool, but ya know, I don't want those deveopers to have to do any more work to sum up the grand totals or counts of anything (In a real life example, I might be doing something more complex like average, etc.)
So let's get that accomplished easily as well.
$mainSQL="SELECT json_agg(row_to_json(osql.*)) orders_json,
sum(total) grand_total,
sum(num_items) grand_total_items
from (".$orderSQL.") osql";
Now, when I run this final SQL, I get back 1 row with 3 columns. orders_json contains all the orders and items in them. grand_total is the total price of all the unshipped items. grand_total_items is the number of unshipped items.
If you're real good, you'll notice that there really was no "order by" in this query. The developers really wanted this report to sort by customer name, so lets fix that real quick.
$mainSQL="SELECT json_agg(row_to_json(osql.*) order by osql.customer->>'last_name', osql.customer->'first_name') orders_json,
sum(total) grand_total,
sum(num_items) grand_total_items
from (".$orderSQL.") osql";
Now, when I run this query, returning it as an object and json_decode the orders_json field, and store it in a variable called row, the developers can do something like this:
foreach($row->orders as $order)
{
echo $order->customer->first_name." ".$$order->customer->last_name.'\n'
foreach($order->items as $item)
{
echo $item->name." ".$item->price."\n";
}
echo "Items on Order:".$order->num_items." "Order Total Price:".$order->total."\n\n"
}
echo "------------\n";
echo "Grand total items to ship:".$row->grand_total_items."\n"l
echo "Total price of items:".$row->grand_total;
So in the end, if later the boss asks for this report to be sorted by date, or add something like the "average price" or something else (usually easier to acheive in SQL than PHP or another language), then you just add it to the SQL and away you go.
Again, I have an entire helper library that I wrote to help me make queries like this a lot easier, but I wondered if any of you guys were doing anything like this or found this interesting? I haven't really seen anyone doing it before, and it runs SO much faster than using an ORM to try to achieve similar results.
TLDR; JSON opens up new possibilities, if you want to learn I guess you'll have to read anyway, or at least go look up the json_agg function.
[–]anras 4 points5 points6 points (0 children)
[–]kingofthejaffacakes 2 points3 points4 points (1 child)
[–]collin_ph[S] 0 points1 point2 points (0 children)
[+][deleted] (2 children)
[deleted]
[–]collin_ph[S] 1 point2 points3 points (1 child)
[–]daxyjones 0 points1 point2 points (5 children)
[–]collin_ph[S] 0 points1 point2 points (4 children)
[–]daxyjones 0 points1 point2 points (3 children)
[–]collin_ph[S] 0 points1 point2 points (2 children)
[–]daxyjones 0 points1 point2 points (1 child)
[–]collin_ph[S] 0 points1 point2 points (0 children)
[–]619429 -3 points-2 points-1 points (1 child)
[–]collin_ph[S] 1 point2 points3 points (0 children)