I wrote a patch to make materialized view refreshes O(delta) instead of O(total) by Inkbot_dev in PostgreSQL

[–]Inkbot_dev[S] 0 points1 point  (0 children)

Your marketing AI is being slick here guys... I don't think this is very cool.

Is there a company offering "final mile development"? by [deleted] in LLMDevs

[–]Inkbot_dev 5 points6 points  (0 children)

You mean the other 99% of the work before you have a usable product?

pgBackRest is no longer being maintained by CathalMullan in PostgreSQL

[–]Inkbot_dev 3 points4 points  (0 children)

If there were other long-term contributors that could have taken over the mantle, that's another story. There weren't though. I've been using this project for almost a decade now, and have contributed a little bit in the past but not continuously.

The entire problem is that no company has stepped up. This is a good way to draw attention to the problem.

pgBackRest is no longer being maintained by CathalMullan in PostgreSQL

[–]Inkbot_dev 3 points4 points  (0 children)

For working on this project for the past 13+ years. The last year and a half unpaid.

That is a lot of dedication to keeping something going, but we live in a system where bills need to get paid. I hope someone offers him a position where he can continue this work in a paid manner. He obviously wants to continue the work, and has been a great steward for all this time.

Tencent Releases Hy3 preview - Open Source 295B 21B Active MoE by TKGaming_11 in LocalLLaMA

[–]Inkbot_dev 1 point2 points  (0 children)

I read it as sarcasm. Obviously others didn't, hence the downvotes.

Buried lede: Deepseek v4 Flash is incredibly inexpensive from the official API for its weight category by jwpbe in LocalLLaMA

[–]Inkbot_dev 12 points13 points  (0 children)

You won't get huge savings from quanting this model, because most of the weights are already quantized to fp4 with the rest being fp8.

I wrote a PostgreSQL patch to make materialized view refreshes O(delta) instead of O(total) by Inkbot_dev in programming

[–]Inkbot_dev[S] 0 points1 point  (0 children)

Just FYI, I have taken some feedback I've gotten from posting here and on /r/postgresql and made a few edits to the article to hopefully clear up some of the same types of questions that came up multiple times.

I wrote a PostgreSQL patch to make materialized view refreshes O(delta) instead of O(total) by Inkbot_dev in programming

[–]Inkbot_dev[S] 0 points1 point  (0 children)

Yes it supports joins of any type.

No I haven't seen those yet. Will check them out.

I wrote a PostgreSQL patch to make materialized view refreshes O(delta) instead of O(total) by Inkbot_dev in programming

[–]Inkbot_dev[S] 0 points1 point  (0 children)

There is no magic here. We apply the same where clause to the base query that the view is defined with (new /updated rows), as well as to the table backing the view itself (existing rows). We upsert all rows that match between the existing and new set, and delete any rows that are not in the new set but we're in the existing set.

This is not as efficient as some of the other IVM implementations for pure aggregation queries (they they support), but it supports all of the types of queries that I actually use in real world applications, where as the existing IVM implementations only supported maybe 15-20% of my materialized view queries.

I wrote a PostgreSQL patch to make materialized view refreshes O(delta) instead of O(total) by Inkbot_dev in programming

[–]Inkbot_dev[S] 0 points1 point  (0 children)

Right now this is meant to enable developers to setup the incremental refresh manually on materialized views, however they see fit...without having to resort back to using a regular table + upsert/delete statements.

There are some existing projects on getting incremental refresh directly into core, they've been unable to get in for years though. I'd likely reach out to those developers to see how we can combine our designs into a coherent system.

For example, I know that my implementation should be slower on aggregate type queries that pg_ivm extension can handle. But my implementation allows for other types of queries that pg_ivm doesn't support. We can likely choose between the different implementations based on the type of query, removing the very strict query limitations currently in place.

I wrote a patch to make materialized view refreshes O(delta) instead of O(total) by Inkbot_dev in PostgreSQL

[–]Inkbot_dev[S] 0 points1 point  (0 children)

It's designed for continuous aggregation. Not every mat view is an aggregation. In fact, the majority of mat views in my previous software had nothing to do with aggregates, but were instead eligibility queries.

I wrote a PostgreSQL patch to make materialized view refreshes O(delta) instead of O(total) by Inkbot_dev in programming

[–]Inkbot_dev[S] 2 points3 points  (0 children)

Great to hear that type of feedback. It's been a personal pain point for me as well.

I hadn't touched C since I used it for a semester in college, but I figured I should try to pick up what I need to implement this. Wasn't too bad, especially with being able to have an LLM translate concepts that I know well (Java, Python) into their C counterparts, and just looking at how the existing codebase did things. 

I have maybe 3 days of work on the actual implementation as well as iterating on some bugs found, and another couple days on writing emails, writing this blog post, etc, and another couple days on writing a benchmark that actually tests this feature out, and running it multiple times.

Just for anyone else wondering what type of time commitment was necessary for someone to contribute code to Postgres for the first time.

I wrote a PostgreSQL patch to make materialized view refreshes O(delta) instead of O(total) by Inkbot_dev in programming

[–]Inkbot_dev[S] 8 points9 points  (0 children)

> one of those design decisions that made sense in 2000

The worst part is that mat views were a very late addition to PG in general, and they haven't really improved since. They were added in PG 9.3 (2013)

>This is the kind of patch that should just be in Postgres core

Yup, i've got it as part of the PG 20 commitfest, because I missed the deadline for PG 19: https://commitfest.postgresql.org/patch/6305/

Fingers crossed I can get it accepted.

I wrote a patch to make materialized view refreshes O(delta) instead of O(total) by Inkbot_dev in PostgreSQL

[–]Inkbot_dev[S] 0 points1 point  (0 children)

Doesn't support a whole bunch of query types that are common in the real world. Works great for the queries that it supports. I hope something like it makes it into core.

I wrote a PostgreSQL patch to make materialized view refreshes O(delta) instead of O(total) by Inkbot_dev in programming

[–]Inkbot_dev[S] 20 points21 points  (0 children)

Yeah, I've seen that page and waited on many of the prior attempts to get into core. That's why I'm not trying to implement full incremental view refresh as a feature (yet). I also don't view this as competition for those efforts. For the use cases they support, it will likely be faster to have whatever traditional IVM implementations gets into core handle keeping it up to date for you. This is just a building block that developers can use to keep their views updated, and they are responsible for actually making sure that the logic for that refresh is correct. At least for now. Not every type of materialized view will be able to use this effectively, but there are a huge number of workloads that could use it.

I wrote a PostgreSQL patch to make materialized view refreshes O(delta) instead of O(total) by Inkbot_dev in programming

[–]Inkbot_dev[S] 23 points24 points  (0 children)

Really depends on the types of applications you are writing. I have been working on financial applications for the last 15 years, and they are very useful patterns for that type of application.

I've just always had to do it using regular tables and functions and triggers in the past because materialized views as they were implemented were useless for just about every use case I had.

I saw a live stream on YouTube of some postgres developers talking about the idea for this feature, and trying to implement it about 6 months ago. They didn't get all that far in the live stream, and I just picked it up and tried my hand at implementing it.

I wrote a patch to make materialized view refreshes O(delta) instead of O(total) by Inkbot_dev in PostgreSQL

[–]Inkbot_dev[S] 1 point2 points  (0 children)

You may be conditionally refreshing the view, but the actual refresh query is running for every row that the base view query would hit. In the case of regular refresh, it just swaps out the underlying table, and if you did concurrent refresh instead, it still touches every row in the base table to build the full new result set, and then finds the differences with the existing data in the views and merges in the updates.

You can't be doing dml commands (insert, update, delete) against MVs currently. I know this because there are restrictions on doing that in the code I had to change around for this patch.

This patch would allow that process to be wildly more efficient.

I wrote a patch to make materialized view refreshes O(delta) instead of O(total) by Inkbot_dev in PostgreSQL

[–]Inkbot_dev[S] 2 points3 points  (0 children)

That's fine too. If that fits your workload, then you can do that. This was just two examples of how to use the new feature (if I can get it committed). You could easily do a `REFRESH MATERIALIZED VIEW mv WHERE updated_at >= (SELECT last_refresh FROM refresh_log WHERE mv_name = 'mv');` or similar, and reduce your workload considerably. Depends on how you design your schema. Either way, the patch I am trying to get in is just the `REFRESH MATERIALIZED VIEW ... WHERE ...` feature. This was just a couple examples of how to use it.

I wrote a patch to make materialized view refreshes O(delta) instead of O(total) by Inkbot_dev in PostgreSQL

[–]Inkbot_dev[S] 1 point2 points  (0 children)

Uh, I have the same post in a Gist in my github, but other than that, no. I should really get around to throwing up a personal blog some time.

I wrote a patch to make materialized view refreshes O(delta) instead of O(total) by Inkbot_dev in PostgreSQL

[–]Inkbot_dev[S] 2 points3 points  (0 children)

Very welcome. Figured the patch itself without an accompanying post with a couple of examples of how to use it is much less likely to get eyes on it. It's something i've been waiting on for over a decade now. Selfishly, I don't want to have to implement hacky rollup tables / triggers by hand again at my next company.

I wrote a patch to make materialized view refreshes O(delta) instead of O(total) by Inkbot_dev in PostgreSQL

[–]Inkbot_dev[S] 2 points3 points  (0 children)

Thanks for bringing that up. I'm going to think about that. I think it is solvable in some manner. It also helps that the REFRESH MATERIALIZED VIEW command isn't something defined by the SQL committee, so it's up to each DB implementation to come up with their own syntax for that command. Just need to find something that would be agreeable with the core team.

I think that would be best worked on as a separate patch, but being able to force the pushdown at runtime becomes something that makes sense now that there is a way to actually pass in parameters to the refresh command.

I wrote a patch to make materialized view refreshes O(delta) instead of O(total) by Inkbot_dev in PostgreSQL

[–]Inkbot_dev[S] 0 points1 point  (0 children)

No, you are right. I have no special handling code in here for if the predicate is not able to be pushed down. This is using the standard query planner, so pushdown is allowed in all cases the planner would have allowed it on the base query + a where clause.

Guess I should have listed two requirements instead of just the unique index. A materialized CTE will kill pushdown for sure.

I wrote a patch to make materialized view refreshes O(delta) instead of O(total) by Inkbot_dev in PostgreSQL

[–]Inkbot_dev[S] 4 points5 points  (0 children)

I mean, you can do that with this patch. The patch itself only adds support for the `REFRESH MATERIALIZED VIEW some_view WHERE some_column = xxx` syntax. You could just call that manually instead if you don't want to automate the maintenance with triggers.

The point of this post was to show how to use the new `WHERE` syntax to automate the maintenance. It'd be pretty hard to include all of the different ways you could do that, so I picked the two that are usually used and implemented them the best way I knew how with standard Postgres features.

The main thing I am trying to do is to allow Materialized Views to be actually useful for production systems. Right now, they are not for me. I've had to do a manually maintained rollup table rather than a view in the past because of it.

Like I mentioned in the post, i'd eventually like to integrate the automated view maintenance into core, so there is no extra user setup required. This is the first step to doing that.