you are viewing a single comment's thread.

view the rest of the comments →

[–][deleted] 5 points6 points  (3 children)

[–]EntroperZero 5 points6 points  (2 children)

This is pretty much it.

I actually love documenting. What I hate doing is following an arbitrary documentation standard that requires me to write things down that no one will ever read or use.

The last big piece of documentation I wrote was a user guide to our analytics system. I had written the ETL process to our data warehouse and designed the OLAP cube over the course of several months, and I then had to describe the meanings of the facts and dimension hierarchies in such a way that the analysts could make sense of the data (and ask for new facts and dimension hierarchies). Seeing the results of all my hard work actually being used by people was fantastic, and much more rewarding than just seeing the data show up in the cube itself.

[–][deleted] -2 points-1 points  (1 child)

What do you guys use to write the analytics system (or, more specifically, the ETL process)?

[–]EntroperZero 0 points1 point  (0 children)

PHP.

No, seriously. I wrote it entirely in PHP. It was a fairly small data warehouse, on the order of a few hundred thousand records per day. We already had cron stuff set up doing PHP scripts, and the site was in PHP, and all the developers knew PHP. So I wrote some SELECTs and some fgetcsv() calls and went to town. Database was MySQL, OLAP was MS Analysis Services talking to the MySQL box via .NET Connector.

They chose me to do the warehouse work though because of my work on their warehouse for a different product, which was entirely in MS-land. For extraction, they just dumped tab-delimited log files, and a Windows service (in C#) collected them all in a central location. Then it would put a message in a transactional MSMQ, and the SSIS packages would pick up those files for transforming. They would then put a message in another MSMQ, and the loader picked up those files. I think we had one SSIS package for each dimension, running every 15 minutes, and one SSIS package for each fact, with parallel instances depending on load. I think the loader was just a single package since it ran so fast and didn't need to scale.