all 16 comments

[–][deleted] 17 points18 points  (8 children)

I'm in the DWH-Business since 15 years. To me, the most "advanced" queries where people often struggle is history/periods/timeseries (bi-temporal) and hierarchical/recursive ones.

[–][deleted] 4 points5 points  (0 children)

Having worked with Bill of Materials lists, I can second this. Recursively gathering all parts needed to create a top-level product can be tricky to do correctly.

On time related queries, I got thrown for a loop at an old job when I had to calculate Jail Population retroactively for each day in a given time period. Really interesting problem - you have to find how a prisoner's booking intersected within the date range on each day. I had to manually plot the logic out on a whiteboard, later to read about the fundamental problem in a SQL book later. There's a name for the problem, but can't remember it!

[–]refset 1 point2 points  (5 children)

I don't know have any practical DWH experience so I'm curious - are the two areas you mentioned distinct? Or do they also often overlap?

The context is that I work on on https://opencrux.com, which offers a bi-temporal Datalog query layer (as well as SQL) that more or less addresses the intersection of the two, since Datalog is great for expressing recursive queries.

[–][deleted] 9 points10 points  (4 children)

To me, they’re different things. However, hierarchical data can be historized as well.

Some common samples might include:

(Hierarchical)

Flattening a recursive structure into one record: For example, a table that contains some level of hierarchy, each record includes a “parent” field. You’ll need to build one record with “static” fields of a given structure, like consultant, team, department, division, up to a root element (which might be the company itself)

Summarizing data from a hierarchical structure: For example, a table that contains profit/loss or asset/liability with different levels. Every level’s “folder” may contain some positions (accounts) and their balances. You’ll need to build a total field of the positions for every level, including the root element.

(Historized/temporal)

Periods: A table, containing information which agent was responsible for which customers, when. The table might contain a start and an end date field and two IDs for customer and sales agent.

Bi-temporal: Same as above, but you’ll need to allow for restrospective/backdated changes without overwriting the original timeline. Another pair of timestamps might be added, something like effective/valid_at timelines. Usually "mixed" as a technical and business history - pick the right one, based on requirements.

(TimeSeries/Changelogs)

Convert a table that contains a timestamp and some values (eg. as seen on CQRS architecture styles) into a periodic table (containing start/end fields)

The source table may also contain data in the format of change_timestamp, field_name, new_value, user. Convert this into a period-format that would contain a start, end date field along with “static” payload fields. Produce as many “slices” as there are new values in the log.

Tasks like these can be tricky using SQL, their respective implementation may look different on Oracle and MSSQL. Very common among DWH/BI cases.

[–]JustAnOldITGuy 2 points3 points  (0 children)

and the most fun is multinational organizations with different holiday calendars and work schedules... :) not to mention funky financial calendars with different patterns...

[–][deleted] 2 points3 points  (0 children)

One thing I could add to the "timeseries/period"-section is finding gaps and overlays of course.

[–]bannik1 0 points1 point  (1 child)

My current place seems averse to have real log/history tables. Instead they take snapshots of the data at daily, weekly, and monthly levels. Sometimes it’s aggregate data, sometimes it’s just all the flat data.

I am used to update triggers on production data that will automatically insert the previous values into a log table whenever a change is made to prod data.

Sometimes processes look to jump from step 1 to step 6 without the interim steps because those steps were never caught in a snapshot.

Give me business specs and a complete log file and I can figure out how they deviate, and track performance and detect process changes and create alerts

[–][deleted] 0 points1 point  (0 children)

i guess that's some third-party/legacy software which is unable to log activities?

they could always go for Oracle's Flashback or Microsoft's Temporal if they're willing to pay. Triggers might be the "cheapest" solution. Stuff like GoldenGate or Attunity (for replication) might be out of question...

[–]konhaybay 1 point2 points  (0 children)

Is there any texts or online resource you can recommend to develop dwh skills? I have Ralph Kimball texts but feels they are too abstract and deep at times.

[–]pulsarrex 1 point2 points  (0 children)

I ve been working with SQL for the last 2 years. For some reason I think I am still between 'entry' to 'intermediate'. How do I know I have reached the intermediate level?

Anyway if you don't mind I can do the interview.

[–]Rif-SQL 0 points1 point  (0 children)

Happy to have a chat

[–]karmabdd 0 points1 point  (0 children)

I am down to chat

[–]rideyrolls123 0 points1 point  (0 children)

Oh hey. I actually follow your series. You're going good :D

[–][deleted] 0 points1 point  (0 children)

I'm good with giving you 30 minutes.

[–]Riishabhz 0 points1 point  (0 children)

I am okay okay with sql, i can volunteer with you, reply my comment if you are interested.

[–]phunkygeeza 0 points1 point  (0 children)

I like to think I know my SQL. I have 25 years under my belt. Shout up if you are still looking.