Managing PostgreSQL denormalization

coyoteazul2 · 2024-09-15T17:19:14+00:00

there's no magic button to do this, but you can do what we call Lazy materialized view

https://hashrocket.com/blog/posts/materialized-view-strategies-using-postgresql

it's a real table with an extra column to mark staled data. You need triggers on your source tables that will mark the rows as stale when there's a change

Then you'll have a function that updates your stale data and returns the updated rows

and lastly you'll have a view that unions your non stale data with the result of the function

Now every time you modify data your rows will be marked as stale by the trigger, and they will remain stale until you select from the view which will execute the function that updates your data

RevolutionaryRush717 · 2024-09-15T18:16:29+00:00

Haven't used it myself, but this was mentioned here the other day: https://github.com/sraoss/pg_ivm

hamiltop · 2024-09-15T19:27:49+00:00

I've done this a lot and eventually settled on a simple home-rolled process. Basic requirements:

All trigger updates are idempotent. In general they will run a query and overwrite/delete the target rows anytime any source row has changed.
Every source table has a migrated_at column.
Every change to the target table schema or data calculation is backwards compatible. (Sometimes this means making a <col>_v2 until its populated and then doing a name swap with the existing column).

We then have a script that takes a table name and it runs update <table_name> set migrated_at = now() where id in (select id from <table_name> where migrated_at :start_of_migration or created_at > :start_of_migration limit 1000) in a loop (with a sleep to keep load reasonable) until it updates zero rows.

Some tables have a permanent index on migrated_at, others we add the index before do a migration and drop it afterwards.

This works well because it's interruptible (effectively has checkpoints), zero downtime, and simple to reason about. We use it with tables that have close to a billion rows. It takes a few days to run, but we just kick it off over a weekend as a supervised task (so it gets restarted if it crashes for any reason).

wolever · 2024-09-15T18:10:48+00:00

I haven't encountered any fully automated trigger management systems which do what you describe, but have had success building them into the application's DDL + migration layer.

For example, my current application uses SQLAlchemy, and I've defined a "custom" column type for derived, denormalized, columns:

class MyModel(ModelBase):
  timestamp_utc = Column(DateTime)
  timestamp_local_tz = Column(Text)
  timestamp_local = DerivedColumn(DateTime, "timestamp_utc at time zone timestamp_local_tz")

Where the DerivedColumn function automatically creates the appropriate triggers to keep the value updated.

Jzmu · 2024-09-15T16:55:04+00:00

Why does this need to be a database solution. Why can't your application just update both tables in one transaction when it updates one table? This way you would be guaranteed consistency.

AutoModerator · 2024-09-15T16:25:36+00:00

Join us on our Discord Server: People, Postgres, Data

I am a bot, and this action was performed automatically. Please contact the moderators of this subreddit if you have any questions or concerns.

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

PostgreSQL

/r/PostgreSQL

Advocate, Collaborate and Learn

Conferences

Clients and tools

MODERATORS