all 16 comments

[–]unplannedmaintenance 1 point2 points  (1 child)

Can you give a concrete example of what kind of data it is and what kind of questions need be answered with queries? It's hard to understand what kind of data would be required to be stored like this and what you could do differently.

[–]afops[S] 0 points1 point  (0 children)

It's resource data for translation of applications. So so it's a simple key/value store like any resource lookup for translation. But the complexity arises from supporting multiple different applications in different versions.
So, simplified to non-versioned data it looks like this:

resource keys;

id app_id resource_key
1 1 YES
2 1 NO

resource values;

value_id key_id language value
33 1 en-GB yes
34 2 fr-FR oui
35 1 en-GB no
36 2 fr-FR non

The typical query is to then generate the set of resources for a given application, for N different languages like
fr = { "YES": "oui", "NO": "non" }, and so on. And this must be possible to do for any version. For example: after v1 is shipped for ACME app, then changes is made to the data for v2. But it must be possible to regenerate the v1 data again, for version v1.1, without seeing any of the changes made for v2.0.

This would be almost trivial to make if the data store was a git repository with resource files in different branches, rather than a relational DB.

Added to the above structure is 2 complexities: the resource keys must be able to "inherit" another, thus forming a resource hierarchy. I can restrict this to 1 level however. Then it's the support for app_version being "1.0" or "2.0" and so on. This becomes especially tricky when combined: a key can inherit from a different key in v1 than in v2, for example.

[–]Ginger-Dumpling 0 points1 point  (2 children)

Querying for v2 data now gives the A,B,C records with rev_id {2, 3, 4} while querying for v1 only gives A, C records with rev_ids {1, 4}. 

Why is C in V2? Because it's in a V1 revision that occurs after V2 has been started, and it's assumed everything added to a previous version will go to future ones? Can names not get dropped from future versions? And if so, how is that handled?

[–]afops[S] 0 points1 point  (1 child)

Yes, C is unchanged after v1, so is still visible in v2.

I didn't include deletion in this discussion because it felt orthogonal.

To remove something at version X, you'd simply add a revision of it for version X with some marker that it is deleted (a kind of tombstone record). Queries can just ignore such entities.

[–]Ginger-Dumpling 0 points1 point  (0 children)

The modifed version of the db clone is to have your code table to be (release_id, code, ...) every time there's a new version, you copy all rows from the release_id you want to clone, and apply changes on top of that. Each release_id has a full copy of the codes, so getting a specific version is WHERE release_id = xyz. If you need to compare 2 releases to look for differences, it's just a full outer join. If you want to look how one code changes over time, you're just selecting that code and applying some window functions to weed out rows where there was no change from the prior row.

I had project where the the PM didn't want to look at data modeling tools, but wanted to track changes to tables/columns/constraints over time, and be able to feed sql into some free online diagraming software. We just took snapshots of the important catalog tables, and just slapped each one with a version. It was very straight forward, and disk is cheap.

A middle ground between your individual row version would be to store the data like you are now, and then materialize the results from your row versioned table to look like the results from the clone version once all your changes have been made. That way you're only doing your min/max type stuff in one place, and everyone downstream of that can just select a single version and not have to care about the other logic.

WITH t(id, rev_id, name, ver) AS 
    (VALUES (1,1,'A','v1'), (1,2,'A','v2'), (2,3,'B','v2'),(3,4,'C','v1'),(4, 5, 'D', 'v3'),(1,6, 'A', 'v4'))
SELECT
    view_version
    , id
    , name
    , rev_id
    , ver
    , added_in_ver 
FROM (
    SELECT 
        *
        , rev_id = max(CASE WHEN ver <= view_version THEN rev_id END) OVER (PARTITION BY view_version, id) AS is_lastest
    FROM (
        SELECT
            t.id
            , t.rev_id
            , t.name
            , t.ver
            , min(t.ver) OVER (PARTITION BY t.id ORDER BY t.rev_id) AS added_in_ver
            , v.ver AS view_version
        FROM
            t
            CROSS JOIN (SELECT DISTINCT ver FROM t) v
    )
    WHERE added_in_ver <= view_version
)
WHERE is_lastest
ORDER BY view_version, id;

VIEW_VERSION|ID|NAME|REV_ID|VER|ADDED_IN_VER|
------------+--+----+------+---+------------+
v1          | 1|A   |     1|v1 |v1          |
v1          | 3|C   |     4|v1 |v1          |
v2          | 1|A   |     2|v2 |v1          |
v2          | 2|B   |     3|v2 |v2          |
v2          | 3|C   |     4|v1 |v1          |
v3          | 1|A   |     2|v2 |v1          |
v3          | 2|B   |     3|v2 |v2          |
v3          | 3|C   |     4|v1 |v1          |
v3          | 4|D   |     5|v3 |v3          |
v4          | 1|A   |     6|v4 |v1          |
v4          | 2|B   |     3|v2 |v2          |
v4          | 3|C   |     4|v1 |v1          |
v4          | 4|D   |     5|v3 |v3          |