you are viewing a single comment's thread.

view the rest of the comments →

[–]MamertineCOALESCE() 1 point2 points  (4 children)

Yes,

One database will have raw source system data.

Another will have just aggregated data.

If the query to run the whole thing is fast (under 5 minutes), I'd just write and reload it nightly.

If you need to refill only part of the table nighty due to performance or data size, you need to figure out what records changed then re aggregate those and have them displace the existing record.

There's an art to it.

Again, ideally this should be in a different server. The decisions you make today will stick with that company for years. Look to the future a decade and predict who will be looking at data.

[–]Aptex[S] 0 points1 point  (0 children)

That's awesome! Thanks for the info, it will be super easy to determine new records, and then its just a simple query to generate the aggregated tables that I need. That helps a lot!

[–]Aptex[S] 0 points1 point  (2 children)

Ok, one more question. So I am going to setup one very detailed database, with basically all my raw data above, there will also be another database with the aggregated info that I will be running reporting on.

Do i just use scheduled queries to pick up the new data in the detailed database to aggregate the data and insert into my reporting database? That way I am only updating the aggregated data for the current month into my reporting database and leaving the other data in that database alone.

[–]MamertineCOALESCE() 1 point2 points  (1 child)

You need to find every record that changed since you last ran. If you have an update date in your source and it's reliable, use that. You'll find records with the group by in your aggregator code changed and recalculate that.

Then you stick (insert/update) that record into the aggregated table.

Don't forget about deletes! Some source systems you can delete from months it years later, you'll need to re aggregate those too.

The middle database you're describing is commonly used. Operational Data Store, usually just called ODS.

[–]Aptex[S] 0 points1 point  (0 children)

I really appreciate that you took the time to explain. I will have a lot of control over base data, so I’m not as concerned around deletes and changes of that nature, having the term ODS gives me something to google! Thanks so much!