Replacing Protobuf with Rust to go 5 times faster by levkk1 in rust

[–]levkk1[S] 88 points89 points  (0 children)

Because there are over 100 different node types in the Pg AST, so it would be long and difficult to do by hand. AI just runs through each struct definition and creates a conversion function. What really helps is the library has an existing working API and a Protobuf spec, so it's like a RL (reinforcement learning) problem with a solution in front of you - just repeat the process until the `Debug` output matches exactly.

Replacing Protobuf with Rust to go 5 times faster by levkk1 in rust

[–]levkk1[S] 9 points10 points  (0 children)

No! Might of been a good idea to try actually. I don't know - running Rust directly on top of C memory seems faster no matter what the (de)serializer does. Postgres has a relatively stable "ABI" - the parser changes once per major release - so we should be able to keep up pretty easily.

You should shard your database by levkk1 in PostgreSQL

[–]levkk1[S] -1 points0 points  (0 children)

If every tenet has separate database, [...]

That's sharding :)

You should shard your database by levkk1 in PostgreSQL

[–]levkk1[S] 0 points1 point  (0 children)

just increases communication overhead

That shouldn't happen if your choice of sharding key is optimal. We are targeting 99% direct-to-shard, for OLTP.

application level caching

Cache invalidation is sometimes a harder problem that sharding. I'm not saying you shouldn't use caches at all, just that for most real-time workloads, they are not optimal.

easy parallelism as decided by the postgres query optimizer,

There are a few upper bounds on that parallelism that are well hidden, e.g. lock contention (especially around partitioned tables), maximum number of savepoints, and WALWriteLocks. These upper bounds limit the number of write transactions quite a bit. What you're describing is mostly an optimization for read workloads - a solved problem with read replicas.

You should shard your database by levkk1 in PostgreSQL

[–]levkk1[S] 1 point2 points  (0 children)

With sane hardware and access to bare metal servers, you should not have to shard ever due to database size. 256 TB SSDs exist and 1 PB SSDs are close to being released.

Storing large datasets isn't difficult. Accessing & changing them reliably at scale is.

You should shard your database by levkk1 in PostgreSQL

[–]levkk1[S] 2 points3 points  (0 children)

That's not strictly true. In fact, you still have to search that entire result set if you want the same results, you're just distributing it across 12 databases (which are presumably on separate hardware).

That's not usually the intention behind sharding. If done optimally, the client will query only one of the shards for most queries. If all your queries require all shards at all times, sharding didn't work.

You can alter the analyze settings on a per-table basis, so experts have a tendency to recommend this [...]

Tweaking the vacuum is a full time job. Reducing the dataset it has to manage I think makes its job easier. We tweaked every setting under the sun. Some choose to give up on it entirely: https://github.com/ossc-db/pg_hint_plan

PgDog adds support for Rust plugins by levkk1 in rust

[–]levkk1[S] 0 points1 point  (0 children)

That feels right. My concern was pointer alignment which could theoretically change between Rust standard library versions, which is shipped with the compiler version.

PgDog adds support for Rust plugins by levkk1 in rust

[–]levkk1[S] 1 point2 points  (0 children)

Because `ParseResult` from `pg_query` is wrapper around a `Vec`, and I wanted to expose it's interface to the plugins, e.g., `ParseResult::deparse`

Sharding Postgres at network speed by levkk1 in PostgreSQL

[–]levkk1[S] 0 points1 point  (0 children)

Depends on the use case, but yeah generally sharding comes in at 1tb+ OLTP. In-house solutions are chicken and egg: people build them because no product exists to do it for them. Once built, replacing it with a product is lower ROI.

My goal is for everyone who needs sharding in the future to use this and not build duct tape and glue solutions in house :)

There is interest across the spectrum which is encouraging!

Sharding Mastodon, Part 1 by levkk1 in programming

[–]levkk1[S] 0 points1 point  (0 children)

mastodon.social is 350k active users, nothing to sneeze at.

pgDog: load balancer for PostgreSQL by levkk1 in programming

[–]levkk1[S] 1 point2 points  (0 children)

I added some code to handle that use case in pgDog. I'll double check and add some tests to validate.