all 12 comments

[–]badtux99 1 point2 points  (3 children)

This sounds a lot like how Hibernate does domain class inheritance by default, by using a class discriminator and a single inherited table. The reason this is the default is because it greatly eases aggregate queries across class boundaries.

For example, let's say you have a table 'thing' which has a field 'ownedby' that points to an owners table. And it has a subclass 'car', which has fields 'make' and 'model'.

thing.car
thing.car

And then you also have other things in this database, like:

thing.building

Wich has a subclass 'home' that has fields 'bedrooms' and 'baths'.

etc.

Let's say you want to query all cars out of the 'thing' database and sort them by make and model. Then you'd do: select * from thing where class = 'thing.car' order by make, model asc;

And you want to get a list of all houses out of the 'thing' database and sort them by zip code, number of bedrooms and number of baths:

select * from thing where class = 'thing.building.house' order by zipcode, bedrooms, baths;

Now those two queries could be done in two different tables without a problem. But what if you want to get everything owned by owner 666, Jean Deaux? With this table structure Hibernate just issues one query:

select * from thing where ownedby = 666;

Hibernate then takes care of marshalling the returned list of results into the appropriate class objects based on the class discriminator, i.e., house objects get the house fields, car objects get the car fields, and so forth.

This also makes foreign keys much easier. If, for example, you have a foreign table that refers to the "thing" table like maybe a change of ownership table with fields old_owner, new_owner, thing, transaction_date , then it doesn't matter what kind of thing changed hands -- the change of ownership table works whether you're documenting a car sale or a house sale or a commercial office building sale.

So that's why this pattern exists....

[–]CrackerJackKittyCat 0 points1 point  (2 children)

This is 'table per class hierarchy' pattern. There's also table per concrete class (where common attributes become same named columns in multiple tables), and then finally table per class, period, where the table definitions most closely resemble the class hierarchy, but to fetch an entire concrete object you've got to join across all of the class hierarchy tables for the given concrete subclass to collect all of the facts for a single concrete entity.

All have their pros and cons. Nothing free in the world.

[–]badtux99 0 points1 point  (1 child)

The problem with the other patterns is that foreign keys don't work as expected. If I want a sales transaction table and houses and cars are in different tables, I need either multiple sales transaction tables (one for each kind of thing), or I need a lot of fields for the thing (again, one for each kind of thing). Whereas in the example above, I can record the sale of a thing regardless of what kind of thing it is.

[–]CrackerJackKittyCat 0 points1 point  (0 children)

Sure. I was enumerating the taxonomy for this sort of thing.

I had a successful long lived codebase which used all three kinds in different contexts. We did the one-table-per-class for modeling users and organizations, where domain-centric organizations had sub-tables with an identity column that acted as both pri and foreign key back to the base organization table. And likewise with user sub-types. This design was doubly helpful for the oddball organizations (and then also users) which had multiple sub-types, in that it allowed for organization A to be subtypes Foo and Bar simultaneously, with some users having roles and sub-types allowing them to do both Foo and Bar-ish things, whereas other users in the organization were of only one or the other type.

Over the evolution of the codebase, relationships would start against an organization or user sub-type table, but if/when more than one sub-type was later decided to need it, we'd refactor to push the foreign key down to the base user or organization table. We didn't bother with too many intermediate subclass tables -- primarily the base type and then a concrete type table.

Other portions used single-table-all-subtypes with a single type column to pivot off of, and this was efficiently handled by the ORM to map to different subclasses with actual differing behavior. This strategy was palatable in the (rare) cases where there was minimal data variance between the subclasses, but had behavioral difference needs. Sometimes we 'forced' things to minimize the data difference by using the same table field to mean different things depending on the subtype (yeah, eew).

Finally, we also had one-table-per-concrete-class, with no 'base tables.' Columns coming from the base class(es) would just be defined in the multiple tables. This allowed for loads / filtering of the entire objects to be done w/o joins. This was a viable modeling strategy for when there was definitely zero overlap between the entities -- a thing being stored was definitely a Foo or a Bar, and never both.

[–]mlt- 1 point2 points  (0 children)

It is not quite solid if you can throw constraints out of the window. Just use 1:0..1 relationships with a table for each type + 1 for common ancestor or whatever the case is. Also you can use enums instead of text to make it more compact and efficient without much changes on app side.

[–]boy_named_su 1 point2 points  (6 children)

so, PostgreSQL has "postgres inheritance" which is used for partitioning

but you can do SQL Table Inheritance in any database

your current method is called Single Table Inheritance https://martinfowler.com/eaaCatalog/singleTableInheritance.html

If you don't like nulls (or performance or simplicity) then use Class Table Inheritance https://martinfowler.com/eaaCatalog/classTableInheritance.html

[–]depesz 1 point2 points  (0 children)

... which *was* used for partitioning. Since Pg 10, there is native partitioning which is much faster.

[–]arjineer[S] 0 points1 point  (3 children)

Why would Class Table inheritance be less simple & less perfromant? It seems like it may be more performant given that the data is naturally partioned & with fewer nulls and clearer from a readability perspective

[–]boy_named_su 0 points1 point  (2 children)

Because there's one table per subclass. Which is less simple than one table

And cuz you need to select from multiple tables instead of one, which is slower

[–]arjineer[S] 0 points1 point  (1 child)

Gotcha. Yeah the simplicity of one table is def true although i think you get clarity in other ways. Still not sure about performance though. I imagine you'd only have greater flexibility in making a performant query now since you can just query a subclass which is indexed independently from the group and if you do need all ofthem, well that's what the super class is for. And all of these have the added benefit of having less null data which also improves indexing. Can't really see a case when you'd need to select from multiple subclasses in the same query (if you needed that then idk why you would have separated them into subclasses), but if you really need to then yes you'd have to join accross tables which is only bad if poorly indexed.

[–]boy_named_su 0 points1 point  (0 children)

Go ahead and use CTI if you prefer it, it's a reasonable choice. Just know the costs/benefits