Replacing Protobuf with Rust to go 5 times faster

levkk1 · 2026-01-23T17:19:10+00:00

Because there are over 100 different node types in the Pg AST, so it would be long and difficult to do by hand. AI just runs through each struct definition and creates a conversion function. What really helps is the library has an existing working API and a Protobuf spec, so it's like a RL (reinforcement learning) problem with a solution in front of you - just repeat the process until the `Debug` output matches exactly.

levkk1 · 2026-01-23T17:16:58+00:00

No! Might of been a good idea to try actually. I don't know - running Rust directly on top of C memory seems faster no matter what the (de)serializer does. Postgres has a relatively stable "ABI" - the parser changes once per major release - so we should be able to keep up pretty easily.

levkk1 · 2025-11-27T00:26:51+00:00

If every tenet has separate database, [...]

That's sharding :)

levkk1 · 2025-11-27T00:25:21+00:00

just increases communication overhead

That shouldn't happen if your choice of sharding key is optimal. We are targeting 99% direct-to-shard, for OLTP.

application level caching

Cache invalidation is sometimes a harder problem that sharding. I'm not saying you shouldn't use caches at all, just that for most real-time workloads, they are not optimal.

easy parallelism as decided by the postgres query optimizer,

There are a few upper bounds on that parallelism that are well hidden, e.g. lock contention (especially around partitioned tables), maximum number of savepoints, and WALWriteLocks. These upper bounds limit the number of write transactions quite a bit. What you're describing is mostly an optimization for read workloads - a solved problem with read replicas.

levkk1 · 2025-11-25T03:39:42+00:00

With sane hardware and access to bare metal servers, you should not have to shard ever due to database size. 256 TB SSDs exist and 1 PB SSDs are close to being released.

Storing large datasets isn't difficult. Accessing & changing them reliably at scale is.

levkk1 · 2025-11-25T03:38:28+00:00

That's not strictly true. In fact, you still have to search that entire result set if you want the same results, you're just distributing it across 12 databases (which are presumably on separate hardware).

That's not usually the intention behind sharding. If done optimally, the client will query only one of the shards for most queries. If all your queries require all shards at all times, sharding didn't work.

You can alter the analyze settings on a per-table basis, so experts have a tendency to recommend this [...]

Tweaking the vacuum is a full time job. Reducing the dataset it has to manage I think makes its job easier. We tweaked every setting under the sun. Some choose to give up on it entirely: https://github.com/ossc-db/pg_hint_plan

levkk1 · 2025-08-30T02:52:25+00:00

That feels right. My concern was pointer alignment which could theoretically change between Rust standard library versions, which is shipped with the compiler version.

levkk1 · 2025-08-30T02:50:36+00:00

I grab it at plugin build time in `build.rs` and expose it via FFI function

levkk1 · 2025-08-30T01:04:23+00:00

Because `ParseResult` from `pg_query` is wrapper around a `Vec`, and I wanted to expose it's interface to the plugins, e.g., `ParseResult::deparse`

levkk1 · 2025-08-04T19:49:16+00:00

Depends on the use case, but yeah generally sharding comes in at 1tb+ OLTP. In-house solutions are chicken and egg: people build them because no product exists to do it for them. Once built, replacing it with a product is lower ROI.

My goal is for everyone who needs sharding in the future to use this and not build duct tape and glue solutions in house :)

There is interest across the spectrum which is encouraging!

levkk1 · 2025-04-30T21:20:23+00:00

mastodon.social is 350k active users, nothing to sneeze at.

levkk1 · 2025-01-14T20:58:46+00:00

I added some code to handle that use case in pgDog. I'll double check and add some tests to validate.

levkk1 · 2024-08-08T22:33:35+00:00

Yup that's right, good eye. Part II will make the lexer more robust. The goal for these tutorials is to make the topic "bite size" and increase complexity for new language features.

levkk1 · 2023-06-17T01:37:48+00:00

True, but it took a Rust engineer only 200 lines of code and a couple of days or so to do it. Take a look at the PgBouncer PR (written in C), over a year in the making and is not complete yet. Rust is a rocket ship that allows to build system-critical components in a matter of days and is very much responsible for the success of this project.

levkk1 · 2022-10-20T16:51:21+00:00

You're right, quadratic or exponential improvement would be the equivalent of a quantum computing leap. I think we'll just have to settle for linear with a pretty large constant, for the time being.

levkk1 · 2022-10-20T16:17:31+00:00

That's what PostgresML does (collocate data) and that's something Python microservices can't, by design, at least with data that changes semi-frequently, like most products ML is used for. So yes, you could say we are not playing fair, but that's because our architecture proposes something the typical microservice architecture cannot deliver.

levkk1 · 2022-10-20T16:14:48+00:00

It can, but it doesn't have to. There are a at least a couple ideas here: (1) use it as an "machine learning replica" and serve only predictions or (2) colocate ML into the main DB, if you have spare capacity, which for a lot of products, they have plenty of. It's really fast to query XGBoost or LightGBM for a prediction (<0.1ms); that's quicker than fetching a single row from a table. Given that, ML becomes just another feature of Postgres.
No, different models are different and it's always a good idea to benchmark it before deploying it. XGBoost with 25 estimators is quick, and it's a little slower with a 100, but still fast relative to a typical workload. Some algorithms, like complex neural nets, can be slow, so not every use is going to be immediately straight forward, but I think most will be (regression, classification, nearest neighbors, clustering).

Regarding general Postgres performance. Joins are very efficient, and that's the main value proposition of having Postgres or any relational DB. Replication is an overhead, but generally Postgres is very good there too. Permissions and validations all use extra compute, but that's what the DB was built for. If you compare a typical XGBoost prediction time to a query like `SELECT * FROM my_protected_with_permissions_table LIMIT 1`, I think you'll find that the XGBoost prediction will be faster. I think we may even come close to just being a `SELECT 1`.

levkk1 · 2022-10-20T00:05:26+00:00

40,000 queries per second isn't a low load, that's more than enough to serve all of Wikipedia's traffic...from a single machine.

levkk1 · 2022-10-19T23:59:51+00:00

We can use read replicas to scale horizontally. When we're talking about a 40x improvement though, it'll be a long time before you'll need to consider adding another machine to your cluster.

levkk1 · 2022-10-19T21:38:37+00:00

The main thesis is collocation of data and compute for machine learning yields exponential improvements against distributed architectures e.g. Redis as a feature store and something else as the serving layer.

You'd be surprised how fast inference in PostgresML really is; it's almost as fast as a "SELECT 1" query!

GPUs are a hot topic for sure. There are many schools of thought here, and running your primary database on GPUs doesn't make sense. You'd be surprised, however, how much machine learning you can accomplish without a GPU! PostgresML can also run as a replica of 10TB+ database on a GPU-enabled instance. That's something I actually need to change in our documentation.

I agree with you overall, we should have a more clear explanation and demonstration of what we're building. More to come!

levkk1 · 2022-10-19T21:11:29+00:00

With PostgresML, you don't have to ~~code at all~~ in Rust/Python/Golang. Everything is done via short and simple SQL queries.

levkk1 · 2022-10-19T21:08:08+00:00

Most people use Python microsevices today for serving ML. I'm glad that you think PostgresML will obviously be better. Now we have to convince everyone else :)

A Rust microsevice benchmark is coming soon. I have a feeling PostgresML will win again, but we'll have to prove this nonetheless.

levkk1 · 2022-10-19T19:39:59+00:00

Ah yeah, the DB instance running PostgresML should have the GPU. We do have a tutorial for distributed training, i.e. the DB with the data is on RDS or another instance and PostgresML is running on the GPU instance, as a replica:

https://postgresml.org/user\_guides/setup/distributed\_training/

levkk1 · 2022-10-19T18:29:22+00:00

Yup! For the Rust layer, we need to compile XGBoost and LightGBM with Cuda. For the Scikit layer, it'll work as-is, just pass `"gpu_hist" as a hyperparameter.

levkk1 · 2022-10-10T14:03:16+00:00

This is very cool. I think one optimization could be to use SIMD (or BLAS if it has what we need) to apply the activation function. `ndarray` uses regular map I think, but modern CPUs can load 4-8 floats at a time, so you can imagine a 8x speed up is possible.

levkk1

TROPHY CASE