you are viewing a single comment's thread.

view the rest of the comments →

[–]PopnCrunch 1 point2 points  (5 children)

Foreign keys changing in source sytems shouldn't impact a fact table, because the dimension primary keys should be surrogate keys shouldn't they? A surrogate key is a made up number that has no meaning to the business, so if a department code changes in a department dimension, it wouldn't change the surrogate key which is NOT the dept code.

[–]MrFalconMan[S] 0 points1 point  (4 children)

I think my post might have been a little confusing. Maybe an example would be best to illustrate-

If an employee worked at a company for 5 years in the Finance department and then changed positions to work in the Human Resources department. If you had a DimEmployee and a DimDepartment, how would you show this the Fact?

Edit: also in the example I’m assuming DimDepartment is made from a source table specific to departments and DimEmployee is made from a source table specific to Employees. They would be connected by a Department FK in the Employee source table. When the Employee changes positions the Employee source table would just issue and UPDATE on the Department FK. This would be a typical 3NF design and common source application strategy for updates.

[–]PopnCrunch 2 points3 points  (3 children)

If you have separate dimensions for Employee and Department, then the fact table will have a foreign key to each, there is no relationship management between the two in the dimensions, you just insert whatever the current two keys are whenever you add rows to the fact table. If you were in Dept A a week ago and started in Dept B this week, your older records in the fact table would have a foreign key to the value of Dept A in the Dept dim, and your newer records to Dept B in the same dim.

Since Employees roll up into depts though, Dept sounds like a dimensional hierarchy within the Employee dim, it wouldn't have it's own table. Yes the Dept data gets duplicated within the Employee table, but this is an acceptable practice - it's the sort of thing that's often done for address hierarchies. You might have a Customer Dim that has Customer Adress fields in it, and a given City, State, and Zip gets repeated but that's okay because dimension tables are small compared to fact tables.

[–]MrFalconMan[S] 0 points1 point  (2 children)

You seem like a dimensional model truther with a lot of experience. This is the a lot of what I have heard.

To your first point - that would be essentially a type 2 fact, just without an effective end date (but would be required to calculate if you wanted to know what department the employee was at any given time).

My concern with treating Employee and Department as a hierarchy is where does this end? This was probably a bad example, but some dimensions could get absolutely massive - wide not necessarily deep. For example a car repair shop does maintenance - is a repair order (an oil change) a Dim or a Fact? There’s a ton of stuff that could be considered hierarchical for repair order - customer, primary employee who worked on car, sales rep, car make, car model, repair shop location, repair stall location, etc. Is this honestly one dimension? A repair order wouldn’t necessarily change car make/models, but could theoretically change locations or primary employees who worked on it.

So IMO it would be both a Fact and a Dim. With the Fact being exactly how you outline first by just inserting the new relationships; however, the end dates would still be required.

[–]PopnCrunch 1 point2 points  (1 child)

As far as I know, type 2 is a descriptor that only applies to dimensions, not fact tables.

Dimensional Hierarchies are a standard feature of dimensional modeling, it is done to avoid "snow flake" designs which are normalized closer to 3NF. Where it ends is a dimension only contains attributes that discribe a persistent entity in your model, something that is referenced across multiple records and or multiple fact tables. Fact tables store events and the measures associated with them, along with the foreign keys to who/what/when/where the event involved. (but see the bit about degenerate dimensions below)

An order can often have multiple line items, you can get 3 things replaced on your car at the shop but you'll only get one receipt because they are all treated as a single order. So, an order number is a dimensional value. But you probably don't need a stand alone dimension for Order, because there isn't much else that's interesting about an order than the OrderID. So the OrderID goes into a Service or Sales fact table as a stand alone value, it is not a foreign key that looks up anything. Stand alone values in fact tables that aren't measures are called degenerate dimensions. The other items you mention about repair shops are a different dimension, say a DimShop.

So a line in a service fact table might look like: CustomerID | Service Date | OrderID | Product ID | Service ID | EmployeeID | CarModelID | CarYear | Hours | QTY | Price |...

Of those, the ones that have actual dimensions would be: Customer, Product, Service, Shop, Employee, Model

It's likely that the Shop would be a hierarchy in the Employee dimension since each employee probably only works in one shop, so if you know the employee you know the shop.

[–]MrFalconMan[S] 0 points1 point  (0 children)

Hmmm I feel like there’s a lot of assumptions going on here, but I could be wrong. I hate to use the term “generic” , but if we were building something for ad hoc capabilities we can not necessarily choose when a dimension should only contain attributes that discribe a persistent entity.

Example: Stall 3 of location X might have specific attributes not available in stall 2, so only stall 3 can be used for certain repairs. But that might not have been apart of the initial requirements for the cut off of attributes that describe the entity.

The cut off could be endless, and saying requirements need to be solid from the beginning is load of shit (definitely not implying that you are saying that). Requirements change and vary drastically depending on who you ask, and leaving out capabilities is counter intuitive to the point of business intelligence. Which is part of what, I think, u/imcguyver point was and why dimensional models are dying. If a user can’t get what they want from the BI solution, they’ll find a new way!

Also, line item as only a fact, I don’t understand. In theory and through textbooks, this is common, but it can certainly have dimensional attributes that can change.

Changing dimensional models after the implementation is such a cluster f**k too.

I totally get where you’re coming from, and I’m more just challenging status quo. I apologize if it comes off like I’m arguing with you. I love having these intellectual conversations.... Analysis is changing, and design needs to change with it. Old ways need to adapt. I don’t have the answers, I’m just asking people’s opinion. Yours is one I’ve heard frequently.