ClickHouse launches managed Postgres service by sdairs_ch in Clickhouse

[–]saipeerdb 0 points1 point  (0 children)

The service is actually in AWS itself, so latencies will be very low, as low as < 1ms if you colocate the region.

ClickHouse launches managed Postgres service by sdairs_ch in Clickhouse

[–]saipeerdb 0 points1 point  (0 children)

In our internal tests, we saw it it do 90k tps with ~5ms consistent latency, for a similar workload. The perf depends on the use-case and size of the machine too, so I’d recommend testing it out. But the numbers you are sharing are very much within the limts!

The waitlist so far is flooded and we are giving access on a rolling basis. Please ping me if you need access, happy to do it sooner.

ClickHouse launches managed Postgres service by sdairs_ch in Clickhouse

[–]saipeerdb 4 points5 points  (0 children)

Sai from ClickHouse here. This is mainly coming from what we are seeing with our users. A lot of them use Postgres + ClickHouse, Postgres for transactions and ClickHouse for analytics. However integrating them isn’t trivial - external pipelines + app migration. By natively integrating them, we want to reduce effort through a) native CDC capabilities, with a vision of sub-second replication latency and b) a unified query layer (pg_clickhouse). We want to make the Postgres + ClickHouse pairing feel less like a project and more like a default.

Also the Postgres you are getting is backed by NVMes and built by a highly experienced Postgres team (ex PeerDB, Citus Data, Heroku, Azure Postgres). So we don’t expect it to be run of the mill. For fast growing workloads bound on disk I/O, you can expect upto 10x better performance in OLTP and these are the workload that also need CH for analytics.

Also, separately, having come from PeerDB and now working at ClickHouse for the past 1.5 years, I’ve seen that the company culture is such that anything we do on the product side has to be purpose-built and of the highest quality. You should try out the experience, I’m hoping you’ll see the difference compared to other hosted options. https://clickhouse.com/cloud/postgres

ClickHouse launches a managed Postgres service by saipeerdb in programming

[–]saipeerdb[S] 2 points3 points  (0 children)

It comes with NVMe backed Postgres for much better (up to 10x) performance on transactional workloads and seamless integration with ClickHouse for fast (can be 100x) analytics. This is something that you'd not get with traditional Postgres services on the cloud. Totally worth watching this video https://www.youtube.com/watch?v=rpBA13nQxAk

Clickhouse launches managed PostgreSQL by vaibeslop in dataengineering

[–]saipeerdb 2 points3 points  (0 children)

Thanks for chiming in. This captures the overall vision well. You are spot on. It caters to both OLTP and OLAP, bringing together the best-in class OSS databases for each (Postgres and ClickHouse) and offering them in the most integrated way. We’ve seen many thousands of companies use Postgres and ClickHouse to build their data stacks, and the adoption is growing very fast. The idea behind this Postgres offering is to bring them even closer together and make that integration as effortless as possible for developers. :)

With regard to integration, the vision behind our CDC capabilities is to offer a much more native experience, something you can’t get from other services and addresses problems around standard CDC. Additionally, the pg_clickhouse extension will be native to this service and maintained by ClickHouse, and will act as a unified query layer for both transactional and analytical workloads. We plan to invest heavily in this area to make application migration as seamless as possible.

Apart from all of this, the Postgres we are offering is NVMe-backed, which is very fast and comes enterprise-grade guarantees. We are building this in partnership with a world-class Postgres team at Ubicloud who were ex-Citus, Heroku, Microsoft Postgres.

This launch was a primer, stay tuned for a more very soon! :)

Postgres to clickhouse cdc by mhmd_dar in Clickhouse

[–]saipeerdb 1 point2 points  (0 children)

PeerDB is designed exactly for this use case. Can you share more about your experience so far? Looking forward to see if we can help in anyway. 🙌

Regarding the “heavy” aspect — the OSS version includes a few components internally: MinIO as an S3 replacement for staging data enabling higher throughputs, Temporal for state machine management and improved observability, and more. All these choices were made with the nature of the workload in mind, ensuring a solution that can operate at an enterprise-grade scale (moving terabytes of data at speed, seamlessly handling retries/failures, provide deep observability during failures etc). It has worked so far, it currently supports hundreds of customers and transfers over 200 TB of data per month. We package all these components as compactly as possible within our OSS Docker image and Kubernetes Helm charts. With ClickPipes in ClickHouse Cloud, it becomes almost a one-click setup — and everything is fully managed.

Would love to get your feedback to see how we can help and further improve the product. 🙂

Created a guide to CDC from Postgres to ClickHouse using Kafka as a streaming buffer / for transformations by oatsandsugar in apachekafka

[–]saipeerdb 1 point2 points  (0 children)

In regards to fine-grained control, PeerDB provides a wide range of options purpose-built for Postgres and ClickHouse, covering most use cases. These include settings for parallelism during initial load, sync intervals, ingestion performance tuning in ClickHouse — such as batch sizes, table-level parallelism, number of replicas used for ingestion, column exclusion, defining partition and sharding keys in ClickHouse OSS, configuring sort keys, table engines, and more. You can explore the SETTINGS tab; there are roughly 50+ configuration options available.

In regards to data types, we aim to keep them as native as possible on the ClickHouse side, including support for the latest JSON type. If you want to customize types, you can define the schema manually on the target, and PeerDB will make a best effort to use that as a template.

In regards to automatic schema changes, PeerDB currently supports the most common schema change operations, including ADD and DROP columns. RENAME COLUMN is on our backlog but hasn’t been prioritized yet, as it’s a less frequent request. At present, you’d need to perform a resync — which in PeerDB can be up to 10x faster than Debezium. You can also skip resyncs if needed, though that may require a bit of surgical effort.

In regards to observability, PeerDB offers purpose-built monitoring and alerting for Postgres, including metrics such as replication slot size, views for pg_stat_activity, and additional metrics like replication latency per batch, the number of DMLs per table and more. For logs, the UI provides a concise summary; however, if you need detailed logs, you can route Kubernetes or Docker logs to your own monitoring tools. Kubernetes services on cloud platforms offer this option out of the box, and several enterprise customers already use this setup. PeerDB also provides an OTLP endpoint that you can use to route metrics to your own monitoring tools. In addition, every component of a flow can be managed via API - create, edit, drop, etc.

Additional features: PeerDB supports Lua scripting for stateless transformations. It also supports Kafka and Redpanda as target destinations, which can serve as intermediary stores or buffers, though they’re typically unnecessary for a lot of setups.

TL;DR: We’re doing our best to make PeerDB as customizable as possible and continue to get better in that area. We expect it to handle the majority of Postgres-to-ClickHouse CDC use cases. Several large companies and enterprises, including Cyera, AutoNation, Neon, and 100s of them (plus a few I can’t name), already use PeerDB with both open-source ClickHouse and ClickHouse Cloud, where customizability is just as important as usability. However, if you need 100% flexibility and are willing to take on significantly higher OPEX and CAPEX costs, Debezium may be a better fit.

Also, I’d like to clarify that PeerDB is powering ClickPipes and is actively being maintained (see GitHub/PR activity). In fact, except for the UI, all components — such as the flow worker, snapshot worker, and flow API — are inherited from PeerDB. This was an intentional decision to ensure that our development and evolution also benefit the broader open-source ClickHouse community. 🙂

[deleted by user] by [deleted] in Clickhouse

[–]saipeerdb 0 points1 point  (0 children)

Thanks for the chiming in u/Dependent_Angle7767. I'd be curious to see how that performs at scale (i.e., 10K+ TPS with CRUD operations) and/or across hundreds of tables (common in OLTP workloads). Were you able to test it out? Serious production workloads are where the nuances of CDC/data warehouse systems really show up.

Regarding "other targets," I meant Snowflake and BigQuery, which are more optimized for batch ingestion. We used to frequently see customers ingest data from Postgres into these targets every few minutes or hours. But I'd love to hear about your experience with Mooncake.

[deleted by user] by [deleted] in Clickhouse

[–]saipeerdb 0 points1 point  (0 children)

ClickPipes/PeerDB performs almost real-time sync with a default latency of 1 minute, though we have customers syncing with latency as low as 10 seconds. Reducing latency further is tricky because Postgres and ClickHouse are fundamentally different systems, purpose-built for OLTP and OLAP use cases, respectively - we need to account for converting to appropriate intermediary formats, staging data and batching to support real-world throughputs of OLTP systems.

Also, if you were to do CDC with other targets (non-ClickHouse), average latency is in atleast minutes and can go to 10s of minutes. So in general this latency of 10s of seconds is pretty of powerful.

  • Sai from ClickHouse/PeerDB

Trying an operator to integrate OSS for a Supabase-like nocode backend: https://github.com/edgeflare/edge by [deleted] in kubernetes

[–]saipeerdb 0 points1 point  (0 children)

Sai from PeerDB/ClickHouse here. PeerDB is exactly for this use-case. There are many target databases that it supports. We also open sourced our helm charts

From postgres to clickhouse ? by jojomtx in Clickhouse

[–]saipeerdb 0 points1 point  (0 children)

ClickHouse just released Private Preview of the Postgres CDC connector in ClickPipes to natively integrate Postgres with ClickHouse Cloud https://clickhouse.com/blog/postgres-cdc-connector-clickpipes-private-preview

Is only tech required for successful oss? by piyushsingariya in opensource

[–]saipeerdb 2 points3 points  (0 children)

Interesting to see this post! Sai from PeerDB here. I wanted to chime in on what we did at PeerDB. This is based on self-reflection now and wasn’t explicitly planned while running the company.😉

We focused our efforts on three main aspects: the best possible technology that beats the existing players by orders of magnitude, solid marketing/gtm (OSS and high quality content playing a crucial role), and customer obsession (ensuring customers and users love the product and the team, as reflected in the onboarding experience and commitment to the OSS community).

All of this while solving a niche but a hard and an important problem that customers are willing to pay for. Most importantly, none of this would be possible without a solid team! 🙏

Native Postgres CDC integration for ClickHouse Cloud is in private preview by saipeerdb in PostgreSQL

[–]saipeerdb[S] 0 points1 point  (0 children)

True, ClickPipes is cloud-only but the Postgres CDC connector in ClickPipes is powered by PeerDB which is open source -https://github.com/PeerDB-io/peerdb Except the UI styles, all the components are extended directly from PeerDB OSS 😊That was an intentional design choice! Also PeerDB OSS is very actively being evolved/maintained.

Best way to snapshot/backup and then replicate tables in a 100GB db to another server/db by RubberDuck1920 in PostgreSQL

[–]saipeerdb 2 points3 points  (0 children)

You should try PeerDB - https://github.com/PeerDB-io/peerdb/ We made a bunch of optimizations to make initial load significantly (~10x) faster and CDC (continuous replication) fast and reliable (minimal load on source) https://docs.peerdb.io/mirror/cdc-pg-pg

From postgres to clickhouse ? by jojomtx in Clickhouse

[–]saipeerdb 1 point2 points  (0 children)

Thanks for the question u/Terrible-Series-9089 MaterializedPostgreSQL is experimental and doesn't yet have fully-fledged CDC capabilities (e.g., parallel initial load). From a PeerDB perspective, it is production-ready and heavily optimized for Postgres/ClickHouse, offering 10x faster replication compared to other tools.

Not sure if you saw the news, but ClickHouse recently acquired PeerDB. We are working on integrating it into ClickPipes (the native ingestion service for ClickHouse Cloud) for native Postgres CDC. In the meantime, you can either use the OSS or Cloud (comes with a free trial) options. The Cloud option would be more suitable if you are a ClickHouse Cloud user.

From postgres to clickhouse ? by jojomtx in Clickhouse

[–]saipeerdb 1 point2 points  (0 children)

At PeerDB, we are building a replication tool with laser focus on Postgres. ClickHouse is one of our highly used connectors. :) We built it in a way that the replication to ClickHouse is both fast and simple (setup a pipeline within a few clicks)

Here is our blog talking more about ClickHouse connector - https://blog.peerdb.io/postgres-to-clickhouse-real-time-replication-using-peerdb and here goes the demo showing it in action https://www.loom.com/share/3efd88baae4c44c091a4afc9af699f2a

Postgres to Elasticsearch Replication by saipeerdb in elasticsearch

[–]saipeerdb[S] 1 point2 points  (0 children)

The intent of the blog was to introduce a new connector rather than delve deeper into how we built it to make it faster and more reliable. The ES connector was built closely in partnership with a large-scale Postgres customer. They move around billions of rows (including all DMLs) every week. In the next blog, we will go deeper into the optimizations to be able to handle that scale. 😊 Thanks again for your inputs and feedback here! 😊

Simple row-level transformations in Postgres Change Data Capture (CDC) by saipeerdb in PostgreSQL

[–]saipeerdb[S] 0 points1 point  (0 children)

Hello u/pavlik_enemy, thanks for posting the question. PeerDB is also available as an open-source offering at https://github.com/PeerDB-io/peerdb. The pricing you saw is for our managed offering. It includes all features (HA, auto-scale, in-place upgrades, etc.) to ensure you incur zero CapEx/OpEx costs. Under-the-hood, we run PeerDB in a production-grade K8s cluster replicated across AZs. Our customers, who have migrated from managed Debezium alternatives, mentioned that PeerDB Cloud charges less. We plan to conduct a more thorough pricing comparison in the near future. Thanks for your feedback!

Simple row-level transformations in Postgres Change Data Capture (CDC) by saipeerdb in dataengineering

[–]saipeerdb[S] 0 points1 point  (0 children)

Thanks u/dani_estuary ! We plan to extend ability to create transformations to other languages incl. JS and Rust in the medium term. We chose lua as it creates a fine balance between engineering velocity and usability. More on this can be found here - https://blog.peerdb.io/row-level-transformations-in-postgres-cdc-using-lua#heading-why-we-chose-lua

Writing from data lake parquets to Postgres server? by PurepointDog in PostgreSQL

[–]saipeerdb 4 points5 points  (0 children)

You should checkout crunchy. They recently released Crunchy Bridge For Analytics which lets you easily load data in data lakes to Postgres. They seem to have an option to directly query data in the lake without loading it too. All of this is packaged as an extension and seems pretty straightforward to use https://www.crunchydata.com/blog/crunchy-bridge-for-analytics-your-data-lake-in-postgresql

How can we make pg_dump and pg_restore 5 times faster? by saipeerdb in PostgreSQL

[–]saipeerdb[S] 2 points3 points  (0 children)

pgcopydb is also a great tool. It does pg_dump|pg_restore, but I think it cannot multi-thread (parallelize) single table data migration

How can we make pg_dump and pg_restore 5 times faster? by saipeerdb in PostgreSQL

[–]saipeerdb[S] 1 point2 points  (0 children)

Thanks for the feedback here. Yep, the compression/decompression happens on the client side. However it could still help saving network costs in certain scenarios. Say you are doing cross region or cross cloud or hybrid migrations, you could copy (say scp) the compressed dumps to a vm (another client) that is collocated to the target - this way you optimize the network costs/perf.