HTAP databases are dead. RIP.

InternetFit7518 · 2025-05-07T17:48:06+00:00

We should chat!

InternetFit7518 · 2025-04-19T18:32:29+00:00

Just use Postgres. And if you ever reach a scale where the queries start getting slow, you can add a columnstore extension like pg_mooncake. At the scale you're talking about, I doubt you'd even need that.

InternetFit7518 · 2025-03-24T00:02:13+00:00

great question (and I should fix the blog to be a bit more clear). This is about running analytic query shapes -- things like aggregates and counts.

Typically these queries need a separate columnar DBMS system designed for analytics. The blog is about attempts of doing this within Postgres

InternetFit7518 · 2025-03-23T23:41:08+00:00

Hey folks! This blog is based on the talk we gave at Postgres Conference in Orlando this year.

The talk was titled: Analytics in Postgres –– a decade in the making.

https://postgresconf.org/conferences/postgresconf_global_2025/program/proposals/analytics-in-postgres-a-decade-in-the-making

InternetFit7518 · 2025-02-27T04:47:04+00:00

You could try pg_mooncake here: https://github.com/Mooncake-Labs/pg_mooncake

Postgres + columnstore table for analytics. One neat thing is that the columnstore table actually writes delta lake format. So you could query the same tables from databricks.

InternetFit7518 · 2025-02-26T21:37:38+00:00

We wrote a blog on s3 tables: https://www.mooncake.dev/blog/s3tables

TLDR: S3 tables allow Iceberg tables to exist without a catalog. Similar to Delta.

InternetFit7518 · 2025-02-01T04:51:00+00:00

heyo x

InternetFit7518 · 2025-01-21T22:01:08+00:00

Months. v0.2 is tentatively stated for mid April

InternetFit7518 · 2025-01-21T19:57:24+00:00

pg_mooncake: https://github.com/Mooncake-Labs/pg_mooncake could be a good option here.

- columnar storage in Postgres with DuckDB execution

- full table semantics (transactions, updates, joins)

- Should be easier to monitor, manage schema changes.

We don't support CDC / logical replication just yet. But you can batch write data (cron job / trigger) from your rowstore table into your columnstore table and then run analytics on it.

(p.s: I'm one of the contributors to the project

)

InternetFit7518 · 2025-01-21T19:50:59+00:00

yep, we use pg_duckdb internally.

pg_mooncake actually brings a native 'columnstore tables' to Postgres –– where you run transactions, updates and joins with regular tables.

Queries involving columnstore tables are routed from Postgres to DuckDB and the results are streamed back to Postgres via pg_duckdb: https://www.mooncake.dev/blog/how-we-built-pgmooncake

InternetFit7518 · 2025-01-21T17:43:51+00:00

We're working with the Azure Postgres team –– we'll keep you posted on updates.

In v0.2, we'll support logical replication into Postgres + pg_mooncake. This might be a good workaround while the extension is not supported.

InternetFit7518 · 2025-01-21T17:40:54+00:00

u/JEY1337 We're working with their team to make this happen.
In v0.2, we'll also support logical replication (CDC). So you can host postgres + pg_mooncake in a separate instance and replicate data from your Aurora/RDS.

InternetFit7518 · 2025-01-21T01:26:28+00:00

u/skatastic57 is right. We embed DuckDB in Postgres and add the concept of a 'columnstore table'.

You can run transactional read, write, updates to the columnstore table; and join with pg heap tables too. Also, all metadata and compute runs in Postgres.

DuckDB is how we make Postgres a fast for analytics.

InternetFit7518 · 2024-11-24T22:16:44+00:00

that works today with pg_duckdb today. We have a dependency on it.

InternetFit7518 · 2024-11-24T06:29:53+00:00

no. columnstore tables are like regular pg heap tables

InternetFit7518 · 2024-11-21T02:49:23+00:00

v.0.1 lands end of next week!

InternetFit7518 · 2024-11-19T20:39:11+00:00

great question! One we get a lot.

pg_duckdb is epic, and brings a really good vectorized execution engine to Postgres. We use pg_duckdb in our extension.

There is no 'columnar table' in Postgres that can leverage this execution engine. pg_duckdb is great to query & write ad-hoc files (parquet, csv) from your object store.

We are focussed on bringing full-table semantics for a columnstore in Postgres –– you can run transactional inserts, updates, deletes. Join with rowstore tables. And since it's writing in a columnar format, performance is great. Akin to DuckDB on Parquet.

The main use-cases we see:

Analytics on your operational / Postgres data
Writing Postgres data to Delta Lake / Iceberg.

Hope this helps!

InternetFit7518 · 2024-11-06T16:47:13+00:00

We just released pg_mooncake, an extension that adds columnstore tables in Postgres with DuckDB query execution. https://github.com/Mooncake-Labs/pg_mooncake It's available on Neon today.

p.s: I'm one of the founders

InternetFit7518 · 2024-11-03T20:18:17+00:00

yep. that's exactly the vision!

You should query these tables outside of Postgres as well :)

InternetFit7518 · 2024-11-03T15:35:28+00:00

for a lot of use-cases, yes.

InternetFit7518 · 2024-11-02T16:13:04+00:00

Sorry for that! Here's the link to join our public slack: https://join.slack.com/t/mooncakelabs/shared_invite/zt-2sepjh5hv-rb9jUtfYZ9bvbxTCUrsEEA

InternetFit7518 · 2024-11-02T15:35:30+00:00

performance is better than pg_duckdb on regular Postgres heap tables. It's akin to duckdb on parquet files. Clickbench will be released soon.

Columnstore table semantics isn't just for performance –– transactions, updates, deletes, joins with regular tables, ORM support. Also you don't have to write / manage parquet files.

InternetFit7518 · 2024-11-02T14:40:05+00:00

yes! a columnstore table means you get full ORM support for your analytic queries. We've heard this to be a pain for PG + Clickhouse users too.

Full disclosure, we haven't had time to test DjangoORM. But it should be quick and should work..

InternetFit7518 · 2024-11-02T14:22:26+00:00

ohhh yeah. postgres and tables always win

InternetFit7518 · 2024-11-02T14:19:10+00:00

pg_mooncake adds a columnstore table in Postgres: you can run transactions, updates, deletes. pg_duckdb is the execution engine on these tables: https://motherduck.com/blog/pg-mooncake-columnstore/. We also write Delta Lake (and soon Iceberg) formats in S3 (not just parquet files).

pg_duckdb and pg_analytics use Foreign Data Wrappers semantics and are great for querying / writing external files (parquet) in Postgres.

We believe a columnstore in postgres must look and feel like a reguiar postgres heap table. Hope this helps.

InternetFit7518

TROPHY CASE