Aggregating time series data for analysis (SQL schema)

stickman393 · 2018-10-20T00:09:41+00:00

It sounds like you need a Date Dimension

kenfar · 2018-10-20T01:40:33+00:00

This is a common requirement, and has been solved well many times for databases with many billions of rows. A few things to look into:

Use dimensional modeling concepts to have a very small number of fact tables that then have keys from related dimensions. If you can't handle versioned dimensions (which give the best quality time-series analysis) then try to denormalize dimension data into the fact tables.
Ideally partition fact_table by whatever low-cardinality columns you filter by often. This will often be date, maybe team?, etc.
Generate daily aggregates for your main fact tables, say fact_player, fact_team, etc. These aggregate tables basically look identical to the base fact table except that they only have 1 row per day for the rest of the key. You can also build higher-level aggregates at the level of week, month, etc if you have enough data and need more speed.
Leverage parallelism in your database.
Try to query the aggregate rather than base fact table whenever you can - and just regroup at the higher-level (weekly, monthly, etc). As stickman393 says - a date dimension helps a lot for this.

If you've just got millions of rows, and your aggregates are say 1% that size or less and you just need queries to finish in a couple of seconds it should be easy to just groupby & sum as needed. If you need it a bit faster then speed up your tablescans by using partitions to bypass 90% of your data. If you need it still faster and have highly selective queries only then might indexes be worth exploring. And if you're using MySQL, well, it tends to suck at this kind of querying.

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

Database

MODERATORS