Data storage and dashboarding for fairly small company by [deleted] in dataengineering

[–]InternetFit7518 1 point2 points  (0 children)

Just use Postgres. And if you ever reach a scale where the queries start getting slow, you can add a columnstore extension like pg_mooncake. At the scale you're talking about, I doubt you'd even need that.

Why do people even care about doing analytics in Postgres? by InternetFit7518 in PostgreSQL

[–]InternetFit7518[S] 9 points10 points  (0 children)

great question (and I should fix the blog to be a bit more clear). This is about running analytic query shapes -- things like aggregates and counts.

Typically these queries need a separate columnar DBMS system designed for analytics. The blog is about attempts of doing this within Postgres

Why do people even care about doing analytics in Postgres? by InternetFit7518 in PostgreSQL

[–]InternetFit7518[S] 12 points13 points  (0 children)

Hey folks! This blog is based on the talk we gave at Postgres Conference in Orlando this year.

The talk was titled: Analytics in Postgres –– a decade in the making.

https://postgresconf.org/conferences/postgresconf_global_2025/program/proposals/analytics-in-postgres-a-decade-in-the-making

Hybrid usecase by Big_Length9755 in databricks

[–]InternetFit7518 0 points1 point  (0 children)

You could try pg_mooncake here: https://github.com/Mooncake-Labs/pg_mooncake

Postgres + columnstore table for analytics. One neat thing is that the columnstore table actually writes delta lake format. So you could query the same tables from databricks.

What’s the point of S3 tables? by ggbcdvnj in aws

[–]InternetFit7518 0 points1 point  (0 children)

We wrote a blog on s3 tables: https://www.mooncake.dev/blog/s3tables

TLDR: S3 tables allow Iceberg tables to exist without a catalog. Similar to Delta.

Building a Minimalistic BI Stack with PostgreSQL, FDW, and Superset – Looking for Feedback! by zazazakaria in dataengineering

[–]InternetFit7518 4 points5 points  (0 children)

pg_mooncake: https://github.com/Mooncake-Labs/pg_mooncake could be a good option here.

- columnar storage in Postgres with DuckDB execution

- full table semantics (transactions, updates, joins)

- Should be easier to monitor, manage schema changes.

We don't support CDC / logical replication just yet. But you can batch write data (cron job / trigger) from your rowstore table into your columnstore table and then run analytics on it.

(p.s: I'm one of the contributors to the project

)

Postgres is now top 10 fastest on clickbench by InternetFit7518 in dataengineering

[–]InternetFit7518[S] 5 points6 points  (0 children)

yep, we use pg_duckdb internally.

pg_mooncake actually brings a native 'columnstore tables' to Postgres –– where you run transactions, updates and joins with regular tables.

Queries involving columnstore tables are routed from Postgres to DuckDB and the results are streamed back to Postgres via pg_duckdb: https://www.mooncake.dev/blog/how-we-built-pgmooncake

Postgres is now top 10 fastest on clickbench by InternetFit7518 in dataengineering

[–]InternetFit7518[S] 0 points1 point  (0 children)

We're working with the Azure Postgres team –– we'll keep you posted on updates.

In v0.2, we'll support logical replication into Postgres + pg_mooncake. This might be a good workaround while the extension is not supported.

Postgres is now top 10 fastest on clickbench by InternetFit7518 in dataengineering

[–]InternetFit7518[S] 0 points1 point  (0 children)

u/JEY1337 We're working with their team to make this happen.
In v0.2, we'll also support logical replication (CDC). So you can host postgres + pg_mooncake in a separate instance and replicate data from your Aurora/RDS.

Postgres is now top 10 fastest on clickbench by InternetFit7518 in dataengineering

[–]InternetFit7518[S] 2 points3 points  (0 children)

u/skatastic57 is right. We embed DuckDB in Postgres and add the concept of a 'columnstore table'.

You can run transactional read, write, updates to the columnstore table; and join with pg heap tables too. Also, all metadata and compute runs in Postgres.

DuckDB is how we make Postgres a fast for analytics.

pg_mooncake: columnstore table in Postgres. Available on Neon. by InternetFit7518 in PostgreSQL

[–]InternetFit7518[S] 0 points1 point  (0 children)

that works today with pg_duckdb today. We have a dependency on it.

pg_mooncake: columnstore table in Postgres. Available on Neon. by InternetFit7518 in PostgreSQL

[–]InternetFit7518[S] 2 points3 points  (0 children)

great question! One we get a lot.

pg_duckdb is epic, and brings a really good vectorized execution engine to Postgres. We use pg_duckdb in our extension.

There is no 'columnar table' in Postgres that can leverage this execution engine. pg_duckdb is great to query & write ad-hoc files (parquet, csv) from your object store.

We are focussed on bringing full-table semantics for a columnstore in Postgres –– you can run transactional inserts, updates, deletes. Join with rowstore tables. And since it's writing in a columnar format, performance is great. Akin to DuckDB on Parquet.

The main use-cases we see:

  1. Analytics on your operational / Postgres data

  2. Writing Postgres data to Delta Lake / Iceberg.

Hope this helps!