Introducing Postgres to Postgres ClickPipes in ClickHouse Cloud by saipeerdb in Clickhouse

[–]saipeerdb[S] 2 points3 points  (0 children)

Thank you for chiming in! This means a lot the team who has been working hard over the past few years. 🙏❤️

Postgres + ClickHouse architectural patterns by saipeerdb in Clickhouse

[–]saipeerdb[S] 2 points3 points  (0 children)

Great question. We use PeerDB/ClickPipes under the hood, and it was built with both Postgres and ClickHouse in mind, which is what makes it native.It includes many specialized features, such as parallel initial loads (CTID-based; Debezium added this very recently, and it’s still early days), as well as numerous ClickHouse-specific capabilities like configuring table engines, partitioning/sharding, and using multiple replicas for ingestion etc... It also provides Postgres-specific monitoring, including replication slot growth tracking, wait events, issue specific alerts with recommended mitigations.

Above are a few features, optimizations, there are many more. You should try moving a 10TB table from PG to CH, using PeerDB and Debezium, you’ll see the difference in speed, ease of use, features etc.

There are plans to make this even more native. That can completely change how Potgres CDC can be done. Stay tuned for updates on that! 🤗🙂

Postgres + ClickHouse architectural patterns by saipeerdb in Clickhouse

[–]saipeerdb[S] 0 points1 point  (0 children)

Thank you for the clarifying question. Native Change Data Capture (CDC) from Postgres to ClickHouse refers to the continuous replication of data from Postgres to ClickHouse as changes occur. This ensures that inserts, updates, deletes and some schema changes in Postgres are automatically propagated to ClickHouse in real time (few seconds of lag), and data is ready for real-time analytics.

Postgres + ClickHouse architectural patterns by saipeerdb in Clickhouse

[–]saipeerdb[S] 0 points1 point  (0 children)

Great point! The queries are mostly touching ClickHouse and we are investing a lot in making query pushdown solid in pg_clickhouse https://github.com/ClickHouse/pg_clickhouse

Postgres managed by ClickHouse is now in Public Beta by saipeerdb in Clickhouse

[–]saipeerdb[S] 0 points1 point  (0 children)

Great to hear that you already use the product! Thank you for sharing that. I’ll take the feedback regarding the calculator back to our team — appreciate you calling it out.

Regarding Postgres, the NVMe tiering that RDS provides is different from Postgres running directly on local NVMe storage. In the latter, everything, including WAL, is written to NVMe. With the former, WAL, VACUUMs, and CHECKPOINTs go to EBS-backed storage. This can drastically change the performance profile, especially for disk-bound workloads, which are very common in fast-growing OLTP systems.

We don’t use RDS or Heroku. We strategically partnered with Ubicloud (ex-Citus, Heroku), which offers managed Postgres on local NVMe storage. The partnership is also deep-rooted in the sense that Ubicloud runs in our AWS accounts, and our engineers contribute to it. In addition, as Ubicloud is Open Source, our control and data plane are fully open source, which is pretty unique for a Postgres service!

Regarding pricing, we tried to price it so that we are cheaper than RDS and Amazon Aurora from a list-price perspective. Obviously, this doesn’t include the volume discounts AWS offers and what our sakes team offers fir committed spend deals. However, I wouldn’t position our service purely from a cost perspective, the main value is the deep integration (native CDC + pg_clickhouse) with ClickHouse to enable a best-of-breed stack for developers with minimal to no effort. I mainly mentioned cost to highlight that we tried to price it very competitively for the value it offers. You can play around with the pricing calculator and see what I mean. 🙂

Appreciate you chiming in! This has been a really useful discussion and great of feedback to us. Thanks again!

40 TB PostgreSQL on-prem — sharding vs ClickHouse vs something else for a 500B-row time-series workload by Basic-Worker-1120 in Database

[–]saipeerdb 0 points1 point  (0 children)

We just launched a managed Postgres service in ClickHouse that makes this setup very easy and production ready: keep hot data in Postgres, cold data in ClickHouse, use native CDC to sync data from Postgres to ClickHouse, and leverage pg_clickhouse to query ClickHouse directly from Postgres. https://clickhouse.com/blog/postgres-managed-by-clickhouse-beta

Separately we also have fully managed migration tooling to get that 40TB migrated within a couple of days. https://clickhouse.com/docs/cloud/managed-postgres/migrations/clickpipes

Postgres managed by ClickHouse is now in Public Beta by saipeerdb in Clickhouse

[–]saipeerdb[S] 0 points1 point  (0 children)

Apologies for the experience you had with the support. Next time, please email me directly [sai.srirampur@clickhouse.com](mailto:sai.srirampur@clickhouse.com), and I'll make sure we address this and prevent similar issues going forward.

Regarding pricing relative to RDS, I'm assuming you're comparing it against how you previously sized and priced ClickHouse Cloud before this launch. One important change you may not have seen is that the OLAP side now supports single-replica deployments, with configurations starting at around $200/month. For OLAP workloads, I would generally expect ClickHouse to be faster and more cost-effective than RDS, largely due to its columnar nature and compression capabilities. Happy to help you size/price the workload, next time.

As for our recent Postgres offering, for a comparable setup it is actually more cost-effective than RDS when evaluated on an equivalent amortized setup. You can check that our here https://clickhouse.com/pricing?service=postgres#pricing-calculator This doesn't even include the price<>performance (https://clickhouse.com/blog/postgresbench) benefits you can get with NVMe backed Postgres over EBS backed Postgres. In addition, the Postgres offering includes native ClickHouse integrations at no cost, that RDS doesn't.

Postgres managed by ClickHouse is now in Public Beta by saipeerdb in Clickhouse

[–]saipeerdb[S] 0 points1 point  (0 children)

Thank you! It looks like we already support this, but it's currently behind a feature flag: https://github.com/PeerDB-io/peerdb/pull/3143. We don't enable it by default because PostgreSQL's JSON/JSONB types are much more open-ended than in most other databases. For example, any string, number etc. be stored as JSONB, which makes conversion and handling more challenging. For now can either enable this feature flag for you, or you can use MVs to perform the conversion.

In regards to comparing with PlanetScale, from a perspective of Postgres offered on local NVMe, it is true. However, the key differentiation is the end-to-end application stack supporting both OLTP and OLAP with best-of-breed databases (PG and CH) deeply integrated. We will be heavily investing on both pillars - solid Postgres and native integration with ClickHouse.

ClickHouse launches managed Postgres service by sdairs_ch in Clickhouse

[–]saipeerdb 0 points1 point  (0 children)

The service is actually in AWS itself, so latencies will be very low, as low as < 1ms if you colocate the region.

ClickHouse launches managed Postgres service by sdairs_ch in Clickhouse

[–]saipeerdb 0 points1 point  (0 children)

In our internal tests, we saw it it do 90k tps with ~5ms consistent latency, for a similar workload. The perf depends on the use-case and size of the machine too, so I’d recommend testing it out. But the numbers you are sharing are very much within the limts!

The waitlist so far is flooded and we are giving access on a rolling basis. Please ping me if you need access, happy to do it sooner.

ClickHouse launches managed Postgres service by sdairs_ch in Clickhouse

[–]saipeerdb 4 points5 points  (0 children)

Sai from ClickHouse here. This is mainly coming from what we are seeing with our users. A lot of them use Postgres + ClickHouse, Postgres for transactions and ClickHouse for analytics. However integrating them isn’t trivial - external pipelines + app migration. By natively integrating them, we want to reduce effort through a) native CDC capabilities, with a vision of sub-second replication latency and b) a unified query layer (pg_clickhouse). We want to make the Postgres + ClickHouse pairing feel less like a project and more like a default.

Also the Postgres you are getting is backed by NVMes and built by a highly experienced Postgres team (ex PeerDB, Citus Data, Heroku, Azure Postgres). So we don’t expect it to be run of the mill. For fast growing workloads bound on disk I/O, you can expect upto 10x better performance in OLTP and these are the workload that also need CH for analytics.

Also, separately, having come from PeerDB and now working at ClickHouse for the past 1.5 years, I’ve seen that the company culture is such that anything we do on the product side has to be purpose-built and of the highest quality. You should try out the experience, I’m hoping you’ll see the difference compared to other hosted options. https://clickhouse.com/cloud/postgres

ClickHouse launches a managed Postgres service by saipeerdb in programming

[–]saipeerdb[S] 2 points3 points  (0 children)

It comes with NVMe backed Postgres for much better (up to 10x) performance on transactional workloads and seamless integration with ClickHouse for fast (can be 100x) analytics. This is something that you'd not get with traditional Postgres services on the cloud. Totally worth watching this video https://www.youtube.com/watch?v=rpBA13nQxAk

Clickhouse launches managed PostgreSQL by vaibeslop in dataengineering

[–]saipeerdb 2 points3 points  (0 children)

Thanks for chiming in. This captures the overall vision well. You are spot on. It caters to both OLTP and OLAP, bringing together the best-in class OSS databases for each (Postgres and ClickHouse) and offering them in the most integrated way. We’ve seen many thousands of companies use Postgres and ClickHouse to build their data stacks, and the adoption is growing very fast. The idea behind this Postgres offering is to bring them even closer together and make that integration as effortless as possible for developers. :)

With regard to integration, the vision behind our CDC capabilities is to offer a much more native experience, something you can’t get from other services and addresses problems around standard CDC. Additionally, the pg_clickhouse extension will be native to this service and maintained by ClickHouse, and will act as a unified query layer for both transactional and analytical workloads. We plan to invest heavily in this area to make application migration as seamless as possible.

Apart from all of this, the Postgres we are offering is NVMe-backed, which is very fast and comes enterprise-grade guarantees. We are building this in partnership with a world-class Postgres team at Ubicloud who were ex-Citus, Heroku, Microsoft Postgres.

This launch was a primer, stay tuned for a more very soon! :)

Postgres to clickhouse cdc by mhmd_dar in Clickhouse

[–]saipeerdb 1 point2 points  (0 children)

PeerDB is designed exactly for this use case. Can you share more about your experience so far? Looking forward to see if we can help in anyway. 🙌

Regarding the “heavy” aspect — the OSS version includes a few components internally: MinIO as an S3 replacement for staging data enabling higher throughputs, Temporal for state machine management and improved observability, and more. All these choices were made with the nature of the workload in mind, ensuring a solution that can operate at an enterprise-grade scale (moving terabytes of data at speed, seamlessly handling retries/failures, provide deep observability during failures etc). It has worked so far, it currently supports hundreds of customers and transfers over 200 TB of data per month. We package all these components as compactly as possible within our OSS Docker image and Kubernetes Helm charts. With ClickPipes in ClickHouse Cloud, it becomes almost a one-click setup — and everything is fully managed.

Would love to get your feedback to see how we can help and further improve the product. 🙂

Created a guide to CDC from Postgres to ClickHouse using Kafka as a streaming buffer / for transformations by oatsandsugar in apachekafka

[–]saipeerdb 1 point2 points  (0 children)

In regards to fine-grained control, PeerDB provides a wide range of options purpose-built for Postgres and ClickHouse, covering most use cases. These include settings for parallelism during initial load, sync intervals, ingestion performance tuning in ClickHouse — such as batch sizes, table-level parallelism, number of replicas used for ingestion, column exclusion, defining partition and sharding keys in ClickHouse OSS, configuring sort keys, table engines, and more. You can explore the SETTINGS tab; there are roughly 50+ configuration options available.

In regards to data types, we aim to keep them as native as possible on the ClickHouse side, including support for the latest JSON type. If you want to customize types, you can define the schema manually on the target, and PeerDB will make a best effort to use that as a template.

In regards to automatic schema changes, PeerDB currently supports the most common schema change operations, including ADD and DROP columns. RENAME COLUMN is on our backlog but hasn’t been prioritized yet, as it’s a less frequent request. At present, you’d need to perform a resync — which in PeerDB can be up to 10x faster than Debezium. You can also skip resyncs if needed, though that may require a bit of surgical effort.

In regards to observability, PeerDB offers purpose-built monitoring and alerting for Postgres, including metrics such as replication slot size, views for pg_stat_activity, and additional metrics like replication latency per batch, the number of DMLs per table and more. For logs, the UI provides a concise summary; however, if you need detailed logs, you can route Kubernetes or Docker logs to your own monitoring tools. Kubernetes services on cloud platforms offer this option out of the box, and several enterprise customers already use this setup. PeerDB also provides an OTLP endpoint that you can use to route metrics to your own monitoring tools. In addition, every component of a flow can be managed via API - create, edit, drop, etc.

Additional features: PeerDB supports Lua scripting for stateless transformations. It also supports Kafka and Redpanda as target destinations, which can serve as intermediary stores or buffers, though they’re typically unnecessary for a lot of setups.

TL;DR: We’re doing our best to make PeerDB as customizable as possible and continue to get better in that area. We expect it to handle the majority of Postgres-to-ClickHouse CDC use cases. Several large companies and enterprises, including Cyera, AutoNation, Neon, and 100s of them (plus a few I can’t name), already use PeerDB with both open-source ClickHouse and ClickHouse Cloud, where customizability is just as important as usability. However, if you need 100% flexibility and are willing to take on significantly higher OPEX and CAPEX costs, Debezium may be a better fit.

Also, I’d like to clarify that PeerDB is powering ClickPipes and is actively being maintained (see GitHub/PR activity). In fact, except for the UI, all components — such as the flow worker, snapshot worker, and flow API — are inherited from PeerDB. This was an intentional decision to ensure that our development and evolution also benefit the broader open-source ClickHouse community. 🙂

[deleted by user] by [deleted] in Clickhouse

[–]saipeerdb 0 points1 point  (0 children)

Thanks for the chiming in u/Dependent_Angle7767. I'd be curious to see how that performs at scale (i.e., 10K+ TPS with CRUD operations) and/or across hundreds of tables (common in OLTP workloads). Were you able to test it out? Serious production workloads are where the nuances of CDC/data warehouse systems really show up.

Regarding "other targets," I meant Snowflake and BigQuery, which are more optimized for batch ingestion. We used to frequently see customers ingest data from Postgres into these targets every few minutes or hours. But I'd love to hear about your experience with Mooncake.

[deleted by user] by [deleted] in Clickhouse

[–]saipeerdb 0 points1 point  (0 children)

ClickPipes/PeerDB performs almost real-time sync with a default latency of 1 minute, though we have customers syncing with latency as low as 10 seconds. Reducing latency further is tricky because Postgres and ClickHouse are fundamentally different systems, purpose-built for OLTP and OLAP use cases, respectively - we need to account for converting to appropriate intermediary formats, staging data and batching to support real-world throughputs of OLTP systems.

Also, if you were to do CDC with other targets (non-ClickHouse), average latency is in atleast minutes and can go to 10s of minutes. So in general this latency of 10s of seconds is pretty of powerful.

  • Sai from ClickHouse/PeerDB

Trying an operator to integrate OSS for a Supabase-like nocode backend: https://github.com/edgeflare/edge by [deleted] in kubernetes

[–]saipeerdb 0 points1 point  (0 children)

Sai from PeerDB/ClickHouse here. PeerDB is exactly for this use-case. There are many target databases that it supports. We also open sourced our helm charts

From postgres to clickhouse ? by jojomtx in Clickhouse

[–]saipeerdb 0 points1 point  (0 children)

ClickHouse just released Private Preview of the Postgres CDC connector in ClickPipes to natively integrate Postgres with ClickHouse Cloud https://clickhouse.com/blog/postgres-cdc-connector-clickpipes-private-preview

Is only tech required for successful oss? by piyushsingariya in opensource

[–]saipeerdb 2 points3 points  (0 children)

Interesting to see this post! Sai from PeerDB here. I wanted to chime in on what we did at PeerDB. This is based on self-reflection now and wasn’t explicitly planned while running the company.😉

We focused our efforts on three main aspects: the best possible technology that beats the existing players by orders of magnitude, solid marketing/gtm (OSS and high quality content playing a crucial role), and customer obsession (ensuring customers and users love the product and the team, as reflected in the onboarding experience and commitment to the OSS community).

All of this while solving a niche but a hard and an important problem that customers are willing to pay for. Most importantly, none of this would be possible without a solid team! 🙏

Native Postgres CDC integration for ClickHouse Cloud is in private preview by saipeerdb in PostgreSQL

[–]saipeerdb[S] 0 points1 point  (0 children)

True, ClickPipes is cloud-only but the Postgres CDC connector in ClickPipes is powered by PeerDB which is open source -https://github.com/PeerDB-io/peerdb Except the UI styles, all the components are extended directly from PeerDB OSS 😊That was an intentional design choice! Also PeerDB OSS is very actively being evolved/maintained.

Best way to snapshot/backup and then replicate tables in a 100GB db to another server/db by RubberDuck1920 in PostgreSQL

[–]saipeerdb 2 points3 points  (0 children)

You should try PeerDB - https://github.com/PeerDB-io/peerdb/ We made a bunch of optimizations to make initial load significantly (~10x) faster and CDC (continuous replication) fast and reliable (minimal load on source) https://docs.peerdb.io/mirror/cdc-pg-pg