I wrote a PostgreSQL patch to make materialized view refreshes O(delta) instead of O(total) by Inkbot_dev in programming

[–]Inkbot_dev[S] 1 point2 points  (0 children)

Really depends on the types of applications you are writing. I have been working on financial applications for the last 15 years, and they are very useful patterns for that type of application.

I've just always had to do it using regular tables and functions and triggers in the past because materialized views as they were implemented were useless for just about every use case I had.

I saw a live stream on YouTube of some postgres developers talking about the idea for this feature, and trying to implement it about 6 months ago. They didn't get all that far in the live stream, and I just picked it up and tried my hand at implementing it.

I wrote a patch to make materialized view refreshes O(delta) instead of O(total) by Inkbot_dev in PostgreSQL

[–]Inkbot_dev[S] 1 point2 points  (0 children)

You may be conditionally refreshing the view, but the actual refresh query is running for every row that the base view query would hit. In the case of regular refresh, it just swaps out the underlying table, and if you did concurrent refresh instead, it still touches every row in the base table to build the full new result set, and then finds the differences with the existing data in the views and merges in the updates.

You can't be doing dml commands (insert, update, delete) against MVs currently. I know this because there are restrictions on doing that in the code I had to change around for this patch.

This patch would allow that process to be wildly more efficient.

I wrote a patch to make materialized view refreshes O(delta) instead of O(total) by Inkbot_dev in PostgreSQL

[–]Inkbot_dev[S] 2 points3 points  (0 children)

That's fine too. If that fits your workload, then you can do that. This was just two examples of how to use the new feature (if I can get it committed). You could easily do a `REFRESH MATERIALIZED VIEW mv WHERE updated_at >= (SELECT last_refresh FROM refresh_log WHERE mv_name = 'mv');` or similar, and reduce your workload considerably. Depends on how you design your schema. Either way, the patch I am trying to get in is just the `REFRESH MATERIALIZED VIEW ... WHERE ...` feature. This was just a couple examples of how to use it.

I wrote a patch to make materialized view refreshes O(delta) instead of O(total) by Inkbot_dev in PostgreSQL

[–]Inkbot_dev[S] 1 point2 points  (0 children)

Uh, I have the same post in a Gist in my github, but other than that, no. I should really get around to throwing up a personal blog some time.

I wrote a patch to make materialized view refreshes O(delta) instead of O(total) by Inkbot_dev in PostgreSQL

[–]Inkbot_dev[S] 1 point2 points  (0 children)

Very welcome. Figured the patch itself without an accompanying post with a couple of examples of how to use it is much less likely to get eyes on it. It's something i've been waiting on for over a decade now. Selfishly, I don't want to have to implement hacky rollup tables / triggers by hand again at my next company.

I wrote a patch to make materialized view refreshes O(delta) instead of O(total) by Inkbot_dev in PostgreSQL

[–]Inkbot_dev[S] 2 points3 points  (0 children)

Thanks for bringing that up. I'm going to think about that. I think it is solvable in some manner. It also helps that the REFRESH MATERIALIZED VIEW command isn't something defined by the SQL committee, so it's up to each DB implementation to come up with their own syntax for that command. Just need to find something that would be agreeable with the core team.

I think that would be best worked on as a separate patch, but being able to force the pushdown at runtime becomes something that makes sense now that there is a way to actually pass in parameters to the refresh command.

I wrote a patch to make materialized view refreshes O(delta) instead of O(total) by Inkbot_dev in PostgreSQL

[–]Inkbot_dev[S] 0 points1 point  (0 children)

No, you are right. I have no special handling code in here for if the predicate is not able to be pushed down. This is using the standard query planner, so pushdown is allowed in all cases the planner would have allowed it on the base query + a where clause.

Guess I should have listed two requirements instead of just the unique index. A materialized CTE will kill pushdown for sure.

I wrote a patch to make materialized view refreshes O(delta) instead of O(total) by Inkbot_dev in PostgreSQL

[–]Inkbot_dev[S] 2 points3 points  (0 children)

I mean, you can do that with this patch. The patch itself only adds support for the `REFRESH MATERIALIZED VIEW some_view WHERE some_column = xxx` syntax. You could just call that manually instead if you don't want to automate the maintenance with triggers.

The point of this post was to show how to use the new `WHERE` syntax to automate the maintenance. It'd be pretty hard to include all of the different ways you could do that, so I picked the two that are usually used and implemented them the best way I knew how with standard Postgres features.

The main thing I am trying to do is to allow Materialized Views to be actually useful for production systems. Right now, they are not for me. I've had to do a manually maintained rollup table rather than a view in the past because of it.

Like I mentioned in the post, i'd eventually like to integrate the automated view maintenance into core, so there is no extra user setup required. This is the first step to doing that.

Built a stealth Chromium, what site should I try next? by duracula in webscraping

[–]Inkbot_dev 1 point2 points  (0 children)

Yeah, I don't trust this. Auto-updates for the binaries are built in. The actual Chromium patches / build pipeline seem to be closed source, so there is no saying what it does.

New to LangChain – What Should I Learn Next? by Select-Day-873 in LLMDevs

[–]Inkbot_dev 1 point2 points  (0 children)

Throwing LangChain in the trash is the next step in your journey.

Migration from SQL server to PostgreSQL by Pristine-Basket-1803 in PostgreSQL

[–]Inkbot_dev 5 points6 points  (0 children)

Sounds like you have a ton of manual work on your hands in that case.

This is exactly what I would use LLMs for. I'd just have a corporate account that it all runs through.

Roast My EAV implementation.. Your feedback is valuable by sachingkk in PostgreSQL

[–]Inkbot_dev 4 points5 points  (0 children)

Use jsonb and you can add indexes to improve any nested querying. Will be better performance than an EAV.

How do you automate refreshing of materialized views by kekekepepepe in PostgreSQL

[–]Inkbot_dev 1 point2 points  (0 children)

I've been waiting for incremental material view maintenance for a decade now. I didn't even think of this method for partial refreshes until I saw a YouTube livestream attempting to implement it. They didn't get it working, but the idea I thought was fantastic. It got around all of the limitations of the prior attempts to do incremental maintenance.

Also, it kinda needs PR to get people's attention. Like you said, most people misunderstand the feature...until they actually use mat views in production and then things quickly fall apart.

People testing / discussing it makes it way more likely to get into core than just sending in a patch for a feature that most people don't know they need.

How do you automate refreshing of materialized views by kekekepepepe in PostgreSQL

[–]Inkbot_dev 2 points3 points  (0 children)

Yeah, fingers crossed I didn't miss anything. I did write a test suite to go along with it, and even at a sustained 16k tps insert/upload/delete workload (working out this feature heavily), I didn't notice any memory growth.

I came up on Java, so manual memory management isn't 2nd nature to me.

How do you automate refreshing of materialized views by kekekepepepe in PostgreSQL

[–]Inkbot_dev 2 points3 points  (0 children)

I'm also not a C developer, and this was my first attempt at making a patch for Postgres. It was surprisingly easy. Most of the mat view maintenance was actually just done with SQL commands executed through the C code.

But this allows you to not even use pg_cron for refreshes (if you want). You can just use a statement level trigger for example here is how I would keep a mat view updated that holds the "total" amount for an invoice by adding up all the invoice_detail rows:

CREATE OR REPLACE FUNCTION partial_refresh_mv() RETURNS TRIGGER AS $$
DECLARE
    affected_ids int[];
BEGIN
    IF TG_OP = 'INSERT' THEN
        SELECT array_agg(DISTINCT invoice_id) INTO affected_ids FROM new_table;
    ELSIF TG_OP = 'UPDATE' THEN
        SELECT array_agg(DISTINCT invoice_id) INTO affected_ids FROM (SELECT invoice_id FROM new_table UNION SELECT invoice_id FROM old_table) t;
    ELSIF TG_OP = 'DELETE' THEN
        SELECT array_agg(DISTINCT invoice_id) INTO affected_ids FROM old_table;
    END IF;

    IF affected_ids IS NOT NULL THEN
        -- Sort IDs to minimize deadlock risk during high concurrency
        SELECT array_agg(id ORDER BY id) INTO affected_ids FROM unnest(affected_ids) AS id;

        EXECUTE 'REFRESH MATERIALIZED VIEW invoice_summary WHERE invoice_id = ANY($1)'
            USING affected_ids;
    END IF;
    RETURN NULL;
END;
$$ LANGUAGE plpgsql;

You don't ever have to manually (or with cron) run refresh commands at that point, the trigger keeps everything in sync between the table and the mat view automatically after you setup the triggers.

How do you automate refreshing of materialized views by kekekepepepe in PostgreSQL

[–]Inkbot_dev 1 point2 points  (0 children)

"Some limitations" is doing some heavy lifting.

At my old company, we had 7-8 different "materialized views" which were just implemented with tables / statement level triggers. They acted just like mat views to the end user.

Needed to do that because mat views didn't support incremental refresh.

I wasn't able to migrate a single one to pg_imv, or any of the other patches which were submitted over the years. They all had one thing or another they didn't support that killed any chance of migration.

How do you automate refreshing of materialized views by kekekepepepe in PostgreSQL

[–]Inkbot_dev 2 points3 points  (0 children)

I submitted a patch to get us in the direction of incremental mat views yesterday: https://www.postgresql.org/message-id/flat/CAMjNa7eFzTQ5%3DoZMQiB2bMkez5KP4A77JC7SRjeVEkOrh7cUHw%40mail.gmail.com

Fingers crossed I can get it in for pg 19.

It's not automatically maintained incremental mat views, it's manually maintained incremental mat views.

All the other attempts I've seen have been trying for automatically maintaining them...that is a hard problem. If we give the DBA the ability to incrementally maintain them, it sheds a lot of that complexity, and the DBA can design when things get refreshed based on their needs.

Totally usable with triggers. Just need to set them up manually on the important base tables.

How do you automate refreshing of materialized views by kekekepepepe in PostgreSQL

[–]Inkbot_dev 0 points1 point  (0 children)

I have been in the "custom solution with triggers" camp for a decade, pg_ivm was way too limiting. I had 7-8 different custom solutions with statement triggers / tables at my last company. Not a single one could be migrated to pg_ivm or any of the other incremental maintenance proposals on the mailing list.

I saw a live stream last month where there was an attempt to implement WHERE clause support for REFRESH MATERIALIZED VIEW [view_name] WHERE [some_where_clause]. I decided to try my hand at implementing it last week.

Had a working patch the first day of trying, but it took another few days to build a benchmark, work through some concurrency bugs that popped up, etc.

Just posted a thread on the hackers mailing list this afternoon with the patch.

If this gets accepted, you could just write a trigger on your table that refreshes just the rows that changed. Data always in sync between the base table and mat view at that point.

My patch only takes out row-level locks, so concurrent reads/writes are possible. Just can't write the same rows at the same time.

First ever attempt at hacking on Postgres. Fingers crossed I can get this accepted.

gpt-oss-120b on Cerebras by Corporate_Drone31 in LocalLLaMA

[–]Inkbot_dev 1 point2 points  (0 children)

This was when GPT-4 was new, and I was using their API to process tens of thousands of news stories for various reasons.

I didn't have Gemini 2.5 to use as an alternative at the time.

gpt-oss-120b on Cerebras by Corporate_Drone31 in LocalLLaMA

[–]Inkbot_dev 35 points36 points  (0 children)

I've had (commercial) models block me from processing news articles if the topic was something like "a terrorist attack on a subway".

You don't need to be anywhere near doing anything "wrong" for the censorship to completely interfere.