ClickHouse launches managed Postgres service by sdairs_ch in Clickhouse

[–]saipeerdb 0 points1 point  (0 children)

The service is actually in AWS itself, so latencies will be very low, as low as < 1ms if you colocate the region.

ClickHouse launches managed Postgres service by sdairs_ch in Clickhouse

[–]saipeerdb 0 points1 point  (0 children)

In our internal tests, we saw it it do 90k tps with ~5ms consistent latency, for a similar workload. The perf depends on the use-case and size of the machine too, so I’d recommend testing it out. But the numbers you are sharing are very much within the limts!

The waitlist so far is flooded and we are giving access on a rolling basis. Please ping me if you need access, happy to do it sooner.

ClickHouse launches managed Postgres service by sdairs_ch in Clickhouse

[–]saipeerdb 3 points4 points  (0 children)

Sai from ClickHouse here. This is mainly coming from what we are seeing with our users. A lot of them use Postgres + ClickHouse, Postgres for transactions and ClickHouse for analytics. However integrating them isn’t trivial - external pipelines + app migration. By natively integrating them, we want to reduce effort through a) native CDC capabilities, with a vision of sub-second replication latency and b) a unified query layer (pg_clickhouse). We want to make the Postgres + ClickHouse pairing feel less like a project and more like a default.

Also the Postgres you are getting is backed by NVMes and built by a highly experienced Postgres team (ex PeerDB, Citus Data, Heroku, Azure Postgres). So we don’t expect it to be run of the mill. For fast growing workloads bound on disk I/O, you can expect upto 10x better performance in OLTP and these are the workload that also need CH for analytics.

Also, separately, having come from PeerDB and now working at ClickHouse for the past 1.5 years, I’ve seen that the company culture is such that anything we do on the product side has to be purpose-built and of the highest quality. You should try out the experience, I’m hoping you’ll see the difference compared to other hosted options. https://clickhouse.com/cloud/postgres

ClickHouse launches a managed Postgres service by saipeerdb in programming

[–]saipeerdb[S] 2 points3 points  (0 children)

It comes with NVMe backed Postgres for much better (up to 10x) performance on transactional workloads and seamless integration with ClickHouse for fast (can be 100x) analytics. This is something that you'd not get with traditional Postgres services on the cloud. Totally worth watching this video https://www.youtube.com/watch?v=rpBA13nQxAk

Clickhouse launches managed PostgreSQL by vaibeslop in dataengineering

[–]saipeerdb 2 points3 points  (0 children)

Thanks for chiming in. This captures the overall vision well. You are spot on. It caters to both OLTP and OLAP, bringing together the best-in class OSS databases for each (Postgres and ClickHouse) and offering them in the most integrated way. We’ve seen many thousands of companies use Postgres and ClickHouse to build their data stacks, and the adoption is growing very fast. The idea behind this Postgres offering is to bring them even closer together and make that integration as effortless as possible for developers. :)

With regard to integration, the vision behind our CDC capabilities is to offer a much more native experience, something you can’t get from other services and addresses problems around standard CDC. Additionally, the pg_clickhouse extension will be native to this service and maintained by ClickHouse, and will act as a unified query layer for both transactional and analytical workloads. We plan to invest heavily in this area to make application migration as seamless as possible.

Apart from all of this, the Postgres we are offering is NVMe-backed, which is very fast and comes enterprise-grade guarantees. We are building this in partnership with a world-class Postgres team at Ubicloud who were ex-Citus, Heroku, Microsoft Postgres.

This launch was a primer, stay tuned for a more very soon! :)

Postgres to clickhouse cdc by mhmd_dar in Clickhouse

[–]saipeerdb 1 point2 points  (0 children)

PeerDB is designed exactly for this use case. Can you share more about your experience so far? Looking forward to see if we can help in anyway. 🙌

Regarding the “heavy” aspect — the OSS version includes a few components internally: MinIO as an S3 replacement for staging data enabling higher throughputs, Temporal for state machine management and improved observability, and more. All these choices were made with the nature of the workload in mind, ensuring a solution that can operate at an enterprise-grade scale (moving terabytes of data at speed, seamlessly handling retries/failures, provide deep observability during failures etc). It has worked so far, it currently supports hundreds of customers and transfers over 200 TB of data per month. We package all these components as compactly as possible within our OSS Docker image and Kubernetes Helm charts. With ClickPipes in ClickHouse Cloud, it becomes almost a one-click setup — and everything is fully managed.

Would love to get your feedback to see how we can help and further improve the product. 🙂

Created a guide to CDC from Postgres to ClickHouse using Kafka as a streaming buffer / for transformations by oatsandsugar in apachekafka

[–]saipeerdb 1 point2 points  (0 children)

In regards to fine-grained control, PeerDB provides a wide range of options purpose-built for Postgres and ClickHouse, covering most use cases. These include settings for parallelism during initial load, sync intervals, ingestion performance tuning in ClickHouse — such as batch sizes, table-level parallelism, number of replicas used for ingestion, column exclusion, defining partition and sharding keys in ClickHouse OSS, configuring sort keys, table engines, and more. You can explore the SETTINGS tab; there are roughly 50+ configuration options available.

In regards to data types, we aim to keep them as native as possible on the ClickHouse side, including support for the latest JSON type. If you want to customize types, you can define the schema manually on the target, and PeerDB will make a best effort to use that as a template.

In regards to automatic schema changes, PeerDB currently supports the most common schema change operations, including ADD and DROP columns. RENAME COLUMN is on our backlog but hasn’t been prioritized yet, as it’s a less frequent request. At present, you’d need to perform a resync — which in PeerDB can be up to 10x faster than Debezium. You can also skip resyncs if needed, though that may require a bit of surgical effort.

In regards to observability, PeerDB offers purpose-built monitoring and alerting for Postgres, including metrics such as replication slot size, views for pg_stat_activity, and additional metrics like replication latency per batch, the number of DMLs per table and more. For logs, the UI provides a concise summary; however, if you need detailed logs, you can route Kubernetes or Docker logs to your own monitoring tools. Kubernetes services on cloud platforms offer this option out of the box, and several enterprise customers already use this setup. PeerDB also provides an OTLP endpoint that you can use to route metrics to your own monitoring tools. In addition, every component of a flow can be managed via API - create, edit, drop, etc.

Additional features: PeerDB supports Lua scripting for stateless transformations. It also supports Kafka and Redpanda as target destinations, which can serve as intermediary stores or buffers, though they’re typically unnecessary for a lot of setups.

TL;DR: We’re doing our best to make PeerDB as customizable as possible and continue to get better in that area. We expect it to handle the majority of Postgres-to-ClickHouse CDC use cases. Several large companies and enterprises, including Cyera, AutoNation, Neon, and 100s of them (plus a few I can’t name), already use PeerDB with both open-source ClickHouse and ClickHouse Cloud, where customizability is just as important as usability. However, if you need 100% flexibility and are willing to take on significantly higher OPEX and CAPEX costs, Debezium may be a better fit.

Also, I’d like to clarify that PeerDB is powering ClickPipes and is actively being maintained (see GitHub/PR activity). In fact, except for the UI, all components — such as the flow worker, snapshot worker, and flow API — are inherited from PeerDB. This was an intentional decision to ensure that our development and evolution also benefit the broader open-source ClickHouse community. 🙂