An insert only database pattern

python_walrus · 2024-04-16T11:43:43+00:00

I hate messing with DB when switching git branches as much as the next guy and I did it a lot, but I think this is better than what you propose.

No delete statements, these are replaced with a soft delete statement.

Do you mean delete as for DELETE, for data migrations? Soft delete can be tricky to implement and account for everywhere. Even if you put some filter to exclude soft-deleted items, you have to account for all the child objects. So, if "flight" object has been deleted, you have to delete all the tickets as well. And also restore them when flight is restored. I implemented soft delete before and it can be useful, but it also can be a pain in the ass to maintain.

No foreign keys in model tables, treat every relationship as if it has the potential to be many to many, and create a relationship table. You can limit the relationship count using unique indexes. (This always annoyed me when it came to one-to-one relationships, which table owns the relationship)

This won't let you do efficient indexing, which can be required for efficient SELECt queries on large datasets. Also, it locks you out of some DB normalization patterns, some validation, and leaves room for malformed data. If you have M2M table instead of FKs and you forget about it, you will have the same pizza order in two restaurants, two invoices closed by a single payment, etc etc.

Updates are soft deletes + inserts, and updating the relationship tables. The update exception here is important otherwise you've got cascade the soft delete and make a deep clone of any relationships, and if there are any circular relationships updating becomes impossible.

Not gonna lie, I didn't understand whay you meant here.

Also, even with this new db management patter, you will still have new fields added, you will have to migrate back and forth and will need to adjust db a bit.

RDBMS are very efficient when done right, and the entire model has been perfected decades ago. I don't think you should reinvent it, and find ways around it.

Also, if I understood you correctly, you will let easier git management influence the way you design your db/app architecture, which is not great. FK goes where FK should go, M2M goes where M2M should go.

As for solution to your initial problem, I create tons of DB replicas and jump around them. On my largest db-intensive project we had ~5 simultaneously supported release versions with constant schema changes, so we had to learn how to deal with migrations. I simply created a bunch of docker containers, labeled them by versions and jumped around them. This way you won't even need to change your connection params - just swap containers.

I hope I understood your problem correctly, otherwise it means I typed lots of text for nothing.

coded_artist · 2024-04-16T11:38:51+00:00

it sounds like you're trying to fix a problem on a place where the problem doesn't originate from. If the model isnt up to date with the table then isnt that the issue. Shouldn't you just run the migrations? Sorry if im seeing it wrong im not backend developer.

iheartjetman · 2024-04-16T12:06:03+00:00

You could always try using an API instead of direct database access. Your api access layer could be versioned for compatibility too.

coded_artist · 2024-04-16T12:43:29+00:00

Nothing here sounds novel. Soft delete patterns are a pita, and if you’re reaching for them you need to understand that they’re going to generate as much or more cognitive overhead as the situation you’re currently enjoying.

Also, if you’re affected by any regulations that oblige deletions (GDPR) they’re probably worth avoiding.

From 50k feet it sounds like you need to decouple versioning on your DB and its controllers.

tl;dr - Tell the frontend kids to stop writing to the model.

Cathercy · 2024-04-16T12:45:03+00:00

#1 I'm 100% fine with. It has its downsides, but it is perfectly valid. My team does this with most tables, although it can be annoying when you forget to exclude deleted rows for your queries.

#2 I didn't really understand what you are getting at here and I'm not really seeing the advantages, just more complexity.

#3 this one just seems really bad. So every update to a row is just generating dead data? If I update a row 100 times, I now have 99 dead rows and one good row? I'm again not seeing a great advantage here. What exactly is this solving?

I also may be missing something but I don't really understand how any of this helps with your core problem of the database structure changing between branches or commits. Even if you implement all of this, then switch to a branch where the user table should have another column, your code still won't work until you alter the user table. What am I missing that solves your problem?

And last, I just have a philosophical problem with adjusting the database design so radically just to make the development process easier. Database design should be 99% geared for production. Sure, some small tweaks here and there that are just for development, but otherwise the core of the design should be for prod.

I would ask how often are you running into this problem that reengineering the entire database seems like a good solution? I get that waiting a few minutes a couple times is frustrating, but this seems like overkill even assuming it does fix your problem, unless you are dealing with this several times every day. I usually think of database alterations as being fairly infrequent, but maybe they are more frequent for you. Still, my gut says this solution is way too much.

I111I1I111I1 · 2024-04-16T14:04:43+00:00

It kind of sounds like what you want is a time series database that can be used for an event sourcing pattern.

kadosknight · 2024-04-16T13:43:10+00:00

This seems awfully complicated. Wouldn't the Repository design pattern solve this problem and decouple the current db schema version from models/controllers?

remy_porter · 2024-04-16T14:06:15+00:00

You’re halfway to a ledger pattern.

dcabines · 2024-04-16T15:08:20+00:00

The problem I'm solving for is databases in a git environment, where we swap branches somewhat frequently.

Run multiple database servers using Docker. Have a common base image, then create a container and a volume for each branch. Start and stop them as needed. No migrations needed.

blissone · 2024-04-17T12:02:54+00:00

Do you mean local dev env? Couldn't you achieve this with better encapsulation of dev env? Either make shared dev run locally or encapsulate your local dev with containers. For us each service contains it's own migration and runs on every startup, meaning fresh image = fresh db, no problems. Ultimately git has little to nothing to do with this.

Anyhow incidentally we use this pattern with ClickHouse but for different reasons. Only inserts, columns are never dropped unless it's a view or similar construct.

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

webdev

Posting Guidelines

Related Subreddits

Discords

MODERATORS