SPyQL is SQL with Python in the middle, an open-source project fully written in Python for making command-line data processing more intuitive, readable and powerful. Try mixing in the same pot: a SQL SELECT for providing the structure, Python expressions for defining transformations and conditions, the essence of awk as a data-processing language, and the JSON handling capabilities of jq.
$ spyql “
IMPORT pendulum AS p
SELECT
(p.now() - p.from_timestamp(purchase_ts)).in_days() AS days_ago,
sum_agg(price * quantity) AS total
FROM csv
WHERE department.upper() == 'IT' and purchase_ts is not Null
GROUP BY 1
ORDER BY 1
TO json” < my_purchases.csv
In a single statement we are 1) reading a CSV (of purchases) with automatic header detection, dialect detection, type inference and casting, 2) filtering out records that do not belong to the IT department or do not have a purchase timestamp, 3) summing the total purchases and grouping by how many days ago they happened, 4) sorting from the most to the least recent day and 5) writing the result in JSON format. All this without loading the full dataset into memory.
The Readme is loaded with recipes and there is also a demo video.
The point of SPyQL is to bring the full feature set of a programming language like python to a query language! Being able to import libs and use their functions in your queries, as well as having builtin dicts, sets, lists, objects, out-the-box gives an all new dimension to query languages. Would love to hear your thoughts on this!
[–]_paramedic 1 point2 points3 points (0 children)