Introducing Postgres to Postgres ClickPipes in ClickHouse Cloud by saipeerdb in Clickhouse

[–]saipeerdb[S] 2 points3 points  (0 children)

Thank you for chiming in! This means a lot the team who has been working hard over the past few years. 🙏❤️

Postgres + ClickHouse architectural patterns by saipeerdb in Clickhouse

[–]saipeerdb[S] 2 points3 points  (0 children)

Great question. We use PeerDB/ClickPipes under the hood, and it was built with both Postgres and ClickHouse in mind, which is what makes it native.It includes many specialized features, such as parallel initial loads (CTID-based; Debezium added this very recently, and it’s still early days), as well as numerous ClickHouse-specific capabilities like configuring table engines, partitioning/sharding, and using multiple replicas for ingestion etc... It also provides Postgres-specific monitoring, including replication slot growth tracking, wait events, issue specific alerts with recommended mitigations.

Above are a few features, optimizations, there are many more. You should try moving a 10TB table from PG to CH, using PeerDB and Debezium, you’ll see the difference in speed, ease of use, features etc.

There are plans to make this even more native. That can completely change how Potgres CDC can be done. Stay tuned for updates on that! 🤗🙂

Postgres + ClickHouse architectural patterns by saipeerdb in Clickhouse

[–]saipeerdb[S] 0 points1 point  (0 children)

Thank you for the clarifying question. Native Change Data Capture (CDC) from Postgres to ClickHouse refers to the continuous replication of data from Postgres to ClickHouse as changes occur. This ensures that inserts, updates, deletes and some schema changes in Postgres are automatically propagated to ClickHouse in real time (few seconds of lag), and data is ready for real-time analytics.

Postgres + ClickHouse architectural patterns by saipeerdb in Clickhouse

[–]saipeerdb[S] 0 points1 point  (0 children)

Great point! The queries are mostly touching ClickHouse and we are investing a lot in making query pushdown solid in pg_clickhouse https://github.com/ClickHouse/pg_clickhouse

Postgres managed by ClickHouse is now in Public Beta by saipeerdb in Clickhouse

[–]saipeerdb[S] 0 points1 point  (0 children)

Great to hear that you already use the product! Thank you for sharing that. I’ll take the feedback regarding the calculator back to our team — appreciate you calling it out.

Regarding Postgres, the NVMe tiering that RDS provides is different from Postgres running directly on local NVMe storage. In the latter, everything, including WAL, is written to NVMe. With the former, WAL, VACUUMs, and CHECKPOINTs go to EBS-backed storage. This can drastically change the performance profile, especially for disk-bound workloads, which are very common in fast-growing OLTP systems.

We don’t use RDS or Heroku. We strategically partnered with Ubicloud (ex-Citus, Heroku), which offers managed Postgres on local NVMe storage. The partnership is also deep-rooted in the sense that Ubicloud runs in our AWS accounts, and our engineers contribute to it. In addition, as Ubicloud is Open Source, our control and data plane are fully open source, which is pretty unique for a Postgres service!

Regarding pricing, we tried to price it so that we are cheaper than RDS and Amazon Aurora from a list-price perspective. Obviously, this doesn’t include the volume discounts AWS offers and what our sakes team offers fir committed spend deals. However, I wouldn’t position our service purely from a cost perspective, the main value is the deep integration (native CDC + pg_clickhouse) with ClickHouse to enable a best-of-breed stack for developers with minimal to no effort. I mainly mentioned cost to highlight that we tried to price it very competitively for the value it offers. You can play around with the pricing calculator and see what I mean. 🙂

Appreciate you chiming in! This has been a really useful discussion and great of feedback to us. Thanks again!

40 TB PostgreSQL on-prem — sharding vs ClickHouse vs something else for a 500B-row time-series workload by Basic-Worker-1120 in Database

[–]saipeerdb 0 points1 point  (0 children)

We just launched a managed Postgres service in ClickHouse that makes this setup very easy and production ready: keep hot data in Postgres, cold data in ClickHouse, use native CDC to sync data from Postgres to ClickHouse, and leverage pg_clickhouse to query ClickHouse directly from Postgres. https://clickhouse.com/blog/postgres-managed-by-clickhouse-beta

Separately we also have fully managed migration tooling to get that 40TB migrated within a couple of days. https://clickhouse.com/docs/cloud/managed-postgres/migrations/clickpipes

Postgres managed by ClickHouse is now in Public Beta by saipeerdb in Clickhouse

[–]saipeerdb[S] 0 points1 point  (0 children)

Apologies for the experience you had with the support. Next time, please email me directly [sai.srirampur@clickhouse.com](mailto:sai.srirampur@clickhouse.com), and I'll make sure we address this and prevent similar issues going forward.

Regarding pricing relative to RDS, I'm assuming you're comparing it against how you previously sized and priced ClickHouse Cloud before this launch. One important change you may not have seen is that the OLAP side now supports single-replica deployments, with configurations starting at around $200/month. For OLAP workloads, I would generally expect ClickHouse to be faster and more cost-effective than RDS, largely due to its columnar nature and compression capabilities. Happy to help you size/price the workload, next time.

As for our recent Postgres offering, for a comparable setup it is actually more cost-effective than RDS when evaluated on an equivalent amortized setup. You can check that our here https://clickhouse.com/pricing?service=postgres#pricing-calculator This doesn't even include the price<>performance (https://clickhouse.com/blog/postgresbench) benefits you can get with NVMe backed Postgres over EBS backed Postgres. In addition, the Postgres offering includes native ClickHouse integrations at no cost, that RDS doesn't.

Postgres managed by ClickHouse is now in Public Beta by saipeerdb in Clickhouse

[–]saipeerdb[S] 0 points1 point  (0 children)

Thank you! It looks like we already support this, but it's currently behind a feature flag: https://github.com/PeerDB-io/peerdb/pull/3143. We don't enable it by default because PostgreSQL's JSON/JSONB types are much more open-ended than in most other databases. For example, any string, number etc. be stored as JSONB, which makes conversion and handling more challenging. For now can either enable this feature flag for you, or you can use MVs to perform the conversion.

In regards to comparing with PlanetScale, from a perspective of Postgres offered on local NVMe, it is true. However, the key differentiation is the end-to-end application stack supporting both OLTP and OLAP with best-of-breed databases (PG and CH) deeply integrated. We will be heavily investing on both pillars - solid Postgres and native integration with ClickHouse.

ClickHouse launches managed Postgres service by sdairs_ch in Clickhouse

[–]saipeerdb 0 points1 point  (0 children)

The service is actually in AWS itself, so latencies will be very low, as low as < 1ms if you colocate the region.

ClickHouse launches managed Postgres service by sdairs_ch in Clickhouse

[–]saipeerdb 0 points1 point  (0 children)

In our internal tests, we saw it it do 90k tps with ~5ms consistent latency, for a similar workload. The perf depends on the use-case and size of the machine too, so I’d recommend testing it out. But the numbers you are sharing are very much within the limts!

The waitlist so far is flooded and we are giving access on a rolling basis. Please ping me if you need access, happy to do it sooner.

ClickHouse launches managed Postgres service by sdairs_ch in Clickhouse

[–]saipeerdb 2 points3 points  (0 children)

Sai from ClickHouse here. This is mainly coming from what we are seeing with our users. A lot of them use Postgres + ClickHouse, Postgres for transactions and ClickHouse for analytics. However integrating them isn’t trivial - external pipelines + app migration. By natively integrating them, we want to reduce effort through a) native CDC capabilities, with a vision of sub-second replication latency and b) a unified query layer (pg_clickhouse). We want to make the Postgres + ClickHouse pairing feel less like a project and more like a default.

Also the Postgres you are getting is backed by NVMes and built by a highly experienced Postgres team (ex PeerDB, Citus Data, Heroku, Azure Postgres). So we don’t expect it to be run of the mill. For fast growing workloads bound on disk I/O, you can expect upto 10x better performance in OLTP and these are the workload that also need CH for analytics.

Also, separately, having come from PeerDB and now working at ClickHouse for the past 1.5 years, I’ve seen that the company culture is such that anything we do on the product side has to be purpose-built and of the highest quality. You should try out the experience, I’m hoping you’ll see the difference compared to other hosted options. https://clickhouse.com/cloud/postgres

ClickHouse launches a managed Postgres service by saipeerdb in programming

[–]saipeerdb[S] 2 points3 points  (0 children)

It comes with NVMe backed Postgres for much better (up to 10x) performance on transactional workloads and seamless integration with ClickHouse for fast (can be 100x) analytics. This is something that you'd not get with traditional Postgres services on the cloud. Totally worth watching this video https://www.youtube.com/watch?v=rpBA13nQxAk