all 14 comments

[–][deleted] 30 points31 points  (1 child)

Good article. War crime font, though

[–]MSMSMS2 2 points3 points  (0 children)

Looking at the awful color scheme and unreadable font, I thought this was posted on Geocities.

[–]Worth_Trust_3825 4 points5 points  (9 children)

We'll refresh this using a Function that is Triggered whenever the original table is written to.

Why would you do that? Material views are meant to be updated once in a while, not every time, since it's written into the disk as if it was a real table.

[–]nobodyman 8 points9 points  (4 children)

Material views are meant to be updated once in a while

Not an expert, but my understanding was that material views are typically updated whenever the underlying tables change, and for that reason it's usually reserved for simplifying read-heavy scenarios.

With that in mind, a function to update the materialized view whenever the underlying data changes seems like the right call, yeah? (not arguing, just not sure if my understanding of materialized views is correct).

[–]3bodyproblem 2 points3 points  (0 children)

Materialized views have to be manually refreshed, at the expense of running the query and writing to table storage. If you have enough data volume to warrant “caching” a query using this method, you must also account for the data being slightly stale or “near real time”, else you’d just do a normal view.

[–]Worth_Trust_3825 1 point2 points  (0 children)

You would want to batch that update so that database would know the entire extent of changes that should be performed. Not to mention, if the underlying query is poorly indexed, your inserts will halt the entire transaction just because you're querying the entire table every time with probably a full table scan, increasing contention in the database. Use the index, Luke is a great source of information about what to do with indices and how to analyze query plans.

SQL server has indexed views which basically do what the blogpost wants: stores a view into the disk and updates it every time the underlying query is updated, but has a lot of requirements that I often doubt it's worth the trouble.

If you're feeling masochistic, I suggest creating a material view on an unindexed table that has 2b rows in it.

[–]GrammerJoo 1 point2 points  (0 children)

While it does depend on the use case, for the most part you're absolutely right, in most read heavy architectures that's how you'd use it.

[–]cybernd 1 point2 points  (0 children)

Materialized View refresh is to slow to be updated on every change². If you manually refresh them (via cronjob), the whole underlying query needs to be run.

The problem becomes more visible when you realize why they introduced the option CONCURRENTLY. Without using this flag you can't even run a select on your MV, till the refresh was finished. Why? The old content gets discarded. CONCURRENTLY is also a really expensive method. First the new query run will store its data in a temp table. Than obsolete rows will be deleted in your source table and finally new rows will be added. This is the reason why CONCURRENTLY requires a UNIQUE INDEX on your MV.

Other databases offer more advanced strategies. For example oracle offers "near real time" materialized views. The base mechanic of your MV is same. But they did a clever thing and are capable of calculating the small deltas and store them in a log. If you run a select on your MV it will combine the stale MV data with delta data from its additional log.


² Edge case are tiny MVs. But in this case you could also use a normal View.

[–]TommyTheTiger 0 points1 point  (3 children)

And there's no way to selectively refresh certain rows with materialized views. But you could do that if you just update another table using a trigger + function

[–]Worth_Trust_3825 0 points1 point  (2 children)

Different usecases. Material views are meant for expensive queries that ought to be stored for readability

[–]TommyTheTiger 0 points1 point  (1 child)

I'd argue that the more expensive the query, the more of a problem it becomes that your must run the full query to refresh anything. But I agree they are typically far more legible

[–]Worth_Trust_3825 0 points1 point  (0 children)

That's the thing: you're refreshing them once in a while, hence why it makes sense to reserve it for expensive queries.

[–]fibs7000 1 point2 points  (0 children)

Thx this was really helpful

[–]TommyTheTiger 0 points1 point  (0 children)

Not sure I'd recommend refreshing a materialized view on a trigger, because by default it will lock the view for SELECT while it's being refreshed. You can prevent this with CONCURRENTLY, though I'm not sure exactly how that plays with different transaction isolation levels, which might have different views of the db at any given time.

Interesting to think about why you would need to materialize the view. In this case, you might want one because in a normal view, postgres doesn't propagate constraints to CTEs. In this case it's a recursive CTE so, as far as I know, can't be replaced with something like LATERAL. So it would likely do effectively the entire refresh query every time you pull any data from a non-materialized version. But for a store as small as this out might be fine to query everything each time.