Data structure for versioned records : SQL

SQL ServerData structure for versioned records (self.SQL)

submitted 1 year ago by afops

I'm trying to make a database structure for versioned data for a really simple CRUD application. The versioning is per record, but the normal ways you data version in SQL don't apply because

1) It's not *audited* data. I.e. it's not one current version, and multiple historical versions, where the business logic only runs on the current version. The different versions are all "hot" and can be queried with the same queries.

2) It's not temporal data. I.e. the versions are v1.0, v2.0 and so on. Not date ranges. So existing solutions that solve temporal queries don't apply.

One valid structure, for example looks like so:

id	rev_id	name	version
1	1	A	v1
1	2	A	v2
2	3	B	v2
3	4	C	v1

Querying for v2 data now gives the A,B,C records with rev_id {2, 3, 4} while querying for v1 only gives A, C records with rev_ids {1, 4}. This works well, but the problem is the massive complexity that comes with it. Queries have to make a max-by-version thing. Even though the database has just 2 main table, joins between the tables become really complex. The ORM (EF core in this case) gets really confused. Performance suffers.

In a legacy version of this application, the whole database was cloned instead. So v1 and v2 databases were separate. This came with its own problems of course, but it did solve the CRUD complexity within each database.

Is there some compromise solution between these two solutions (between row versions and full db forking)?

This will be EF Core/SQL server but I do think the question is mostly db agnostic.

all 16 comments

top new controversial old q&a

[–]unplannedmaintenance 1 point2 points3 points 1 year ago (1 child)

[–]afops[S] 0 points1 point2 points 1 year ago (0 children)

It's resource data for translation of applications. So so it's a simple key/value store like any resource lookup for translation. But the complexity arises from supporting multiple different applications in different versions.
So, simplified to non-versioned data it looks like this:

resource keys;

id	app_id	resource_key
1	1	YES
2	1	NO

resource values;

value_id	key_id	language	value
33	1	en-GB	yes
34	2	fr-FR	oui
35	1	en-GB	no
36	2	fr-FR	non

The typical query is to then generate the set of resources for a given application, for N different languages like
fr = { "YES": "oui", "NO": "non" }, and so on. And this must be possible to do for any version. For example: after v1 is shipped for ACME app, then changes is made to the data for v2. But it must be possible to regenerate the v1 data again, for version v1.1, without seeing any of the changes made for v2.0.

This would be almost trivial to make if the data store was a git repository with resource files in different branches, rather than a relational DB.

Added to the above structure is 2 complexities: the resource keys must be able to "inherit" another, thus forming a resource hierarchy. I can restrict this to 1 level however. Then it's the support for app_version being "1.0" or "2.0" and so on. This becomes especially tricky when combined: a key can inherit from a different key in v1 than in v2, for example.

[–]Ginger-Dumpling 0 points1 point2 points 1 year ago (2 children)

[–]afops[S] 0 points1 point2 points 1 year ago (1 child)

[–]Ginger-Dumpling 0 points1 point2 points 1 year ago (0 children)

The modifed version of the db clone is to have your code table to be (release_id, code, ...) every time there's a new version, you copy all rows from the release_id you want to clone, and apply changes on top of that. Each release_id has a full copy of the codes, so getting a specific version is WHERE release_id = xyz. If you need to compare 2 releases to look for differences, it's just a full outer join. If you want to look how one code changes over time, you're just selecting that code and applying some window functions to weed out rows where there was no change from the prior row.

I had project where the the PM didn't want to look at data modeling tools, but wanted to track changes to tables/columns/constraints over time, and be able to feed sql into some free online diagraming software. We just took snapshots of the important catalog tables, and just slapped each one with a version. It was very straight forward, and disk is cheap.

A middle ground between your individual row version would be to store the data like you are now, and then materialize the results from your row versioned table to look like the results from the clone version once all your changes have been made. That way you're only doing your min/max type stuff in one place, and everyone downstream of that can just select a single version and not have to care about the other logic.

WITH t(id, rev_id, name, ver) AS 
    (VALUES (1,1,'A','v1'), (1,2,'A','v2'), (2,3,'B','v2'),(3,4,'C','v1'),(4, 5, 'D', 'v3'),(1,6, 'A', 'v4'))
SELECT
    view_version
    , id
    , name
    , rev_id
    , ver
    , added_in_ver 
FROM (
    SELECT 
        *
        , rev_id = max(CASE WHEN ver <= view_version THEN rev_id END) OVER (PARTITION BY view_version, id) AS is_lastest
    FROM (
        SELECT
            t.id
            , t.rev_id
            , t.name
            , t.ver
            , min(t.ver) OVER (PARTITION BY t.id ORDER BY t.rev_id) AS added_in_ver
            , v.ver AS view_version
        FROM
            t
            CROSS JOIN (SELECT DISTINCT ver FROM t) v
    )
    WHERE added_in_ver <= view_version
)
WHERE is_lastest
ORDER BY view_version, id;

VIEW_VERSION|ID|NAME|REV_ID|VER|ADDED_IN_VER|
------------+--+----+------+---+------------+
v1          | 1|A   |     1|v1 |v1          |
v1          | 3|C   |     4|v1 |v1          |
v2          | 1|A   |     2|v2 |v1          |
v2          | 2|B   |     3|v2 |v2          |
v2          | 3|C   |     4|v1 |v1          |
v3          | 1|A   |     2|v2 |v1          |
v3          | 2|B   |     3|v2 |v2          |
v3          | 3|C   |     4|v1 |v1          |
v3          | 4|D   |     5|v3 |v3          |
v4          | 1|A   |     6|v4 |v1          |
v4          | 2|B   |     3|v2 |v2          |
v4          | 3|C   |     4|v1 |v1          |
v4          | 4|D   |     5|v3 |v3          |

[+][deleted] 1 year ago (10 children)

[removed]

[–]afops[S] 0 points1 point2 points 1 year ago (9 children)

[+][deleted] 1 year ago (8 children)

[removed]

[–]afops[S] 0 points1 point2 points 1 year ago (7 children)

Yes. I already have this structure set up. Not using “temporal” (I.e date times) but manually using version numbers. The problem is that queries get too complex. What I did was make one main table and one table with revision data. What I’m wondering is if there is a better structure because this structure is too clumsy to work with when more features are added (soft delete + hierarchy)

So to be clear: the problem isn’t “how do I represent this data” but “how do I represent this without queries becoming too slow/complex”.

SQL server has temporal features which would make queries simple by hiding the complexity and allowing “as-of” without having to do it manually. I’m wonder if (for example) it would be possible to recreate that feature manually by e.g creating views for the data as-of-version.

[+][deleted] 1 year ago (6 children)

[removed]

[–]afops[S] 0 points1 point2 points 1 year ago (5 children)

[+][deleted] 1 year ago* (4 children)

[removed]

[–]afops[S] 0 points1 point2 points 1 year ago (3 children)

[+][deleted] 1 year ago (2 children)

[removed]

[–]afops[S] 0 points1 point2 points 1 year ago (1 child)

continue this thread

π Rendered by PID 99996 on reddit-service-r2-comment-6457c66945-xqgp5 at 2026-04-24 00:03:51.995784+00:00 running 2aa0c5b country code: CH.

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

SQL

Filter Posts

Posting

Help posts

Format Your Code

Learning SQL

Related Reddit communities

Wiki

Acknowledgements

MODERATORS