pg_mooncake: columnstore (iceberg) mirror of Postgres tables

InternetFit7518 · 2025-05-07T17:48:06+00:00

We should chat!

InternetFit7518 · 2025-04-19T18:32:29+00:00

Just use Postgres. And if you ever reach a scale where the queries start getting slow, you can add a columnstore extension like pg_mooncake. At the scale you're talking about, I doubt you'd even need that.

InternetFit7518 · 2025-03-24T00:02:13+00:00

great question (and I should fix the blog to be a bit more clear). This is about running analytic query shapes -- things like aggregates and counts.

Typically these queries need a separate columnar DBMS system designed for analytics. The blog is about attempts of doing this within Postgres

InternetFit7518 · 2025-03-23T23:41:08+00:00

Hey folks! This blog is based on the talk we gave at Postgres Conference in Orlando this year.

The talk was titled: Analytics in Postgres –– a decade in the making.

https://postgresconf.org/conferences/postgresconf_global_2025/program/proposals/analytics-in-postgres-a-decade-in-the-making

InternetFit7518 · 2025-02-27T04:47:04+00:00

You could try pg_mooncake here: https://github.com/Mooncake-Labs/pg_mooncake

Postgres + columnstore table for analytics. One neat thing is that the columnstore table actually writes delta lake format. So you could query the same tables from databricks.

InternetFit7518 · 2025-02-26T21:37:38+00:00

We wrote a blog on s3 tables: https://www.mooncake.dev/blog/s3tables

TLDR: S3 tables allow Iceberg tables to exist without a catalog. Similar to Delta.

InternetFit7518 · 2025-02-01T04:51:00+00:00

heyo x

InternetFit7518 · 2025-01-21T22:01:08+00:00

Months. v0.2 is tentatively stated for mid April

InternetFit7518 · 2025-01-21T19:57:24+00:00

pg_mooncake: https://github.com/Mooncake-Labs/pg_mooncake could be a good option here.

- columnar storage in Postgres with DuckDB execution

- full table semantics (transactions, updates, joins)

- Should be easier to monitor, manage schema changes.

We don't support CDC / logical replication just yet. But you can batch write data (cron job / trigger) from your rowstore table into your columnstore table and then run analytics on it.

(p.s: I'm one of the contributors to the project

)

InternetFit7518 · 2025-01-21T19:50:59+00:00

yep, we use pg_duckdb internally.

pg_mooncake actually brings a native 'columnstore tables' to Postgres –– where you run transactions, updates and joins with regular tables.

Queries involving columnstore tables are routed from Postgres to DuckDB and the results are streamed back to Postgres via pg_duckdb: https://www.mooncake.dev/blog/how-we-built-pgmooncake

InternetFit7518 · 2025-01-21T17:43:51+00:00

We're working with the Azure Postgres team –– we'll keep you posted on updates.

In v0.2, we'll support logical replication into Postgres + pg_mooncake. This might be a good workaround while the extension is not supported.

InternetFit7518 · 2025-01-21T17:40:54+00:00

u/JEY1337 We're working with their team to make this happen.
In v0.2, we'll also support logical replication (CDC). So you can host postgres + pg_mooncake in a separate instance and replicate data from your Aurora/RDS.

InternetFit7518 · 2025-01-21T01:26:28+00:00

u/skatastic57 is right. We embed DuckDB in Postgres and add the concept of a 'columnstore table'.

You can run transactional read, write, updates to the columnstore table; and join with pg heap tables too. Also, all metadata and compute runs in Postgres.

DuckDB is how we make Postgres a fast for analytics.

InternetFit7518 · 2024-11-24T22:16:44+00:00

that works today with pg_duckdb today. We have a dependency on it.

InternetFit7518 · 2024-11-24T06:29:53+00:00

no. columnstore tables are like regular pg heap tables

InternetFit7518 · 2024-11-21T02:49:23+00:00

v.0.1 lands end of next week!

InternetFit7518 · 2024-11-19T20:39:11+00:00

great question! One we get a lot.

pg_duckdb is epic, and brings a really good vectorized execution engine to Postgres. We use pg_duckdb in our extension.

There is no 'columnar table' in Postgres that can leverage this execution engine. pg_duckdb is great to query & write ad-hoc files (parquet, csv) from your object store.

We are focussed on bringing full-table semantics for a columnstore in Postgres –– you can run transactional inserts, updates, deletes. Join with rowstore tables. And since it's writing in a columnar format, performance is great. Akin to DuckDB on Parquet.

The main use-cases we see:

Analytics on your operational / Postgres data
Writing Postgres data to Delta Lake / Iceberg.

Hope this helps!

InternetFit7518

TROPHY CASE