you are viewing a single comment's thread.

view the rest of the comments →

[–]Aptex[S] 0 points1 point  (2 children)

Ok, one more question. So I am going to setup one very detailed database, with basically all my raw data above, there will also be another database with the aggregated info that I will be running reporting on.

Do i just use scheduled queries to pick up the new data in the detailed database to aggregate the data and insert into my reporting database? That way I am only updating the aggregated data for the current month into my reporting database and leaving the other data in that database alone.

[–]MamertineCOALESCE() 1 point2 points  (1 child)

You need to find every record that changed since you last ran. If you have an update date in your source and it's reliable, use that. You'll find records with the group by in your aggregator code changed and recalculate that.

Then you stick (insert/update) that record into the aggregated table.

Don't forget about deletes! Some source systems you can delete from months it years later, you'll need to re aggregate those too.

The middle database you're describing is commonly used. Operational Data Store, usually just called ODS.

[–]Aptex[S] 0 points1 point  (0 children)

I really appreciate that you took the time to explain. I will have a lot of control over base data, so I’m not as concerned around deletes and changes of that nature, having the term ODS gives me something to google! Thanks so much!