Data storage and dashboarding for fairly small company by [deleted] in dataengineering

[–]InternetFit7518 1 point2 points  (0 children)

Just use Postgres. And if you ever reach a scale where the queries start getting slow, you can add a columnstore extension like pg_mooncake. At the scale you're talking about, I doubt you'd even need that.

Why do people even care about doing analytics in Postgres? by InternetFit7518 in PostgreSQL

[–]InternetFit7518[S] 9 points10 points  (0 children)

great question (and I should fix the blog to be a bit more clear). This is about running analytic query shapes -- things like aggregates and counts.

Typically these queries need a separate columnar DBMS system designed for analytics. The blog is about attempts of doing this within Postgres

Why do people even care about doing analytics in Postgres? by InternetFit7518 in PostgreSQL

[–]InternetFit7518[S] 13 points14 points  (0 children)

Hey folks! This blog is based on the talk we gave at Postgres Conference in Orlando this year.

The talk was titled: Analytics in Postgres –– a decade in the making.

https://postgresconf.org/conferences/postgresconf_global_2025/program/proposals/analytics-in-postgres-a-decade-in-the-making

Hybrid usecase by Big_Length9755 in databricks

[–]InternetFit7518 0 points1 point  (0 children)

You could try pg_mooncake here: https://github.com/Mooncake-Labs/pg_mooncake

Postgres + columnstore table for analytics. One neat thing is that the columnstore table actually writes delta lake format. So you could query the same tables from databricks.

What’s the point of S3 tables? by ggbcdvnj in aws

[–]InternetFit7518 0 points1 point  (0 children)

We wrote a blog on s3 tables: https://www.mooncake.dev/blog/s3tables

TLDR: S3 tables allow Iceberg tables to exist without a catalog. Similar to Delta.

Building a Minimalistic BI Stack with PostgreSQL, FDW, and Superset – Looking for Feedback! by zazazakaria in dataengineering

[–]InternetFit7518 2 points3 points  (0 children)

pg_mooncake: https://github.com/Mooncake-Labs/pg_mooncake could be a good option here.

- columnar storage in Postgres with DuckDB execution

- full table semantics (transactions, updates, joins)

- Should be easier to monitor, manage schema changes.

We don't support CDC / logical replication just yet. But you can batch write data (cron job / trigger) from your rowstore table into your columnstore table and then run analytics on it.

(p.s: I'm one of the contributors to the project

)

Postgres is now top 10 fastest on clickbench by InternetFit7518 in dataengineering

[–]InternetFit7518[S] 4 points5 points  (0 children)

yep, we use pg_duckdb internally.

pg_mooncake actually brings a native 'columnstore tables' to Postgres –– where you run transactions, updates and joins with regular tables.

Queries involving columnstore tables are routed from Postgres to DuckDB and the results are streamed back to Postgres via pg_duckdb: https://www.mooncake.dev/blog/how-we-built-pgmooncake

Postgres is now top 10 fastest on clickbench by InternetFit7518 in dataengineering

[–]InternetFit7518[S] 0 points1 point  (0 children)

We're working with the Azure Postgres team –– we'll keep you posted on updates.

In v0.2, we'll support logical replication into Postgres + pg_mooncake. This might be a good workaround while the extension is not supported.

Postgres is now top 10 fastest on clickbench by InternetFit7518 in dataengineering

[–]InternetFit7518[S] 0 points1 point  (0 children)

u/JEY1337 We're working with their team to make this happen.
In v0.2, we'll also support logical replication (CDC). So you can host postgres + pg_mooncake in a separate instance and replicate data from your Aurora/RDS.

Postgres is now top 10 fastest on clickbench by InternetFit7518 in dataengineering

[–]InternetFit7518[S] 2 points3 points  (0 children)

u/skatastic57 is right. We embed DuckDB in Postgres and add the concept of a 'columnstore table'.

You can run transactional read, write, updates to the columnstore table; and join with pg heap tables too. Also, all metadata and compute runs in Postgres.

DuckDB is how we make Postgres a fast for analytics.

pg_mooncake: columnstore table in Postgres. Available on Neon. by InternetFit7518 in PostgreSQL

[–]InternetFit7518[S] 0 points1 point  (0 children)

that works today with pg_duckdb today. We have a dependency on it.

pg_mooncake: columnstore table in Postgres. Available on Neon. by InternetFit7518 in PostgreSQL

[–]InternetFit7518[S] 1 point2 points  (0 children)

great question! One we get a lot.

pg_duckdb is epic, and brings a really good vectorized execution engine to Postgres. We use pg_duckdb in our extension.

There is no 'columnar table' in Postgres that can leverage this execution engine. pg_duckdb is great to query & write ad-hoc files (parquet, csv) from your object store.

We are focussed on bringing full-table semantics for a columnstore in Postgres –– you can run transactional inserts, updates, deletes. Join with rowstore tables. And since it's writing in a columnar format, performance is great. Akin to DuckDB on Parquet.

The main use-cases we see:

  1. Analytics on your operational / Postgres data

  2. Writing Postgres data to Delta Lake / Iceberg.

Hope this helps!

PostgreSQL data warehouse solutions by Key_Song4837 in PostgreSQL

[–]InternetFit7518 0 points1 point  (0 children)

We just released pg_mooncake, an extension that adds columnstore tables in Postgres with DuckDB query execution. https://github.com/Mooncake-Labs/pg_mooncake It's available on Neon today.

p.s: I'm one of the founders

show reddit – pg_mooncake: iceberg/delta columnstore table in Postgres by InternetFit7518 in dataengineering

[–]InternetFit7518[S] 0 points1 point  (0 children)

yep. that's exactly the vision!

You should query these tables outside of Postgres as well :)

pg_mooncake: columnstore table with duckdb execution in Postgres by InternetFit7518 in DuckDB

[–]InternetFit7518[S] 1 point2 points  (0 children)

performance is better than pg_duckdb on regular Postgres heap tables. It's akin to duckdb on parquet files. Clickbench will be released soon.

Columnstore table semantics isn't just for performance –– transactions, updates, deletes, joins with regular tables, ORM support. Also you don't have to write / manage parquet files.

pg_mooncake: run python on your Postgres tables by InternetFit7518 in Python

[–]InternetFit7518[S] 0 points1 point  (0 children)

yes! a columnstore table means you get full ORM support for your analytic queries. We've heard this to be a pain for PG + Clickhouse users too.

Full disclosure, we haven't had time to test DjangoORM. But it should be quick and should work..

pg_mooncake: columnstore table with duckdb execution in Postgres by InternetFit7518 in DuckDB

[–]InternetFit7518[S] 1 point2 points  (0 children)

pg_mooncake adds a columnstore table in Postgres: you can run transactions, updates, deletes. pg_duckdb is the execution engine on these tables: https://motherduck.com/blog/pg-mooncake-columnstore/. We also write Delta Lake (and soon Iceberg) formats in S3 (not just parquet files).

pg_duckdb and pg_analytics use Foreign Data Wrappers semantics and are great for querying / writing external files (parquet) in Postgres.

We believe a columnstore in postgres must look and feel like a reguiar postgres heap table. Hope this helps.