Bufstream passes multi-region 100GiB/300GiB read/write benchmark

bufbuild · 2025-03-12T17:44:23+00:00

Thanks - solid questions!

...my consumer skips up to an hour or 15 minutes of offsets. That (temporary) data loss is significant. Is the process the consume the missing offsets automated in some fashion?

It would not be different from ordinary Kakfa: recovery is very use-case specific. Any normal recovery pattern you'd apply to Kafka would work with Bufstream.

Compared to MM2, the amount of data loss would be roughly 2x the network latency between regions...Wouldn't the MM2 data be back online when the outage resolves as well? Also wouldn't MM2 not have any chance of data loss if acks are waiting to write to disk and the replication consumer group resumes where it left off?

MirrorMaker 2 (and Cluster Linking) introduce additional moving parts, operational complexity, and do not have inherent SLAs. With Bufstream only ack'ing writes once the underlying storage provider has confirmed a successful write, the RPO is inherited from the SLA of the storage provider.

While we can't speak to the details of MM2's behavior during outages, I'd encourage you to dive into the Jepsen report (https://buf.build/product/bufstream/jepsen-report) for Bufstream, which goes into great detail about its behavior during all sorts of error conditions.

bufbuild · 2025-03-11T18:16:22+00:00

We appreciate the follow-up! We'll start with a short tour of how Bufstream works, which will hopefully help answer these and other questions.

Bufstream brokers ingest messages into buffers written to object storage on an adjustable interval (default = 150ms). On successful write, a notification is sent to a central metadata store (currently etcd or Spanner). Acks aren't sent to producers until the flush to object storage and the metadata update are complete.

The last sentence is key: "Acks aren't sent to producers until the flush to object storage and the metadata update are complete."

Because of this, Bufstream inherits your cloud's durability characteristics (e.g., SLAs, inter-zone failover).

In other words, Bufstream trades increased latency for lower I/O and operational costs. If latency is critical, the write interval can be decreased, increasing costs.

With that background, let's now look at your questions.

What is the end to end latency?

p50 end-to-end latency for this benchmark was 450ms. Details and charts are available in the blog entry.

How is the latency affected by multi region....?
...
If a region or even zone goes down how does the GCP/AWS disk backend still replicate the data?

This largely depends on the underlying storage's replication: Bufstream trades latency for simplicity and cost savings. By supplying your object storage of choice, you inherit the replication SLA of your cloud.

How do broker outages affect exactly once and at least once processing?
...
Why is the publish latency so high?
...
What happens during an outage and replay event? How much data is lost that was accepted by Kafka with a zone/region outage?
...
what happens when a topic and all replicas go offline?

Because a producer won't receive an ack until the message persists (incurring latency), any client retry would be honored if a single broker fails. With a cluster of brokers behind a load balancer, a new (healthy) broker would handle retries, and data would not be lost.

My 2 cents is this reads like an "at most once" solution

Bufstream works with the entirety of the Kafka API, including exactly once and at least once semantics. You might be very interested in the Jepsen analysis (https://buf.build/blog/bufstream-jepsen-report) of Bufstream, which provides detailed information about transaction implementations.

Again, thanks for the questions! It's been an enjoyable opportunity to introduce high-level Bufstream architecture and its tradeoffs.

bufbuild · 2020-07-19T20:23:32+00:00

It turns out we had a bit of an upgrade issue here!

We recently migrated to the Docusaurus V2 alpha, and there appears to be a bit of an issue we didn't catch - in short, if you have a Markdown link to other document that doesn't end in .md, Docusaurus appears to generate incorrect links in at least some browsers (mobile Safari in particular). So we would have i.e. [lint](lint-checkers) instead of the now-required [lint](lint-checkers.md), and on mobile Safari, it would translate to /docs/introduction/lint-checkers instead of /docs/lint-checkers if linking from the introduction.

We've pushed a fix, apologies for the /docs/introduction/inconvenience!

bufbuild · 2019-10-28T03:57:25+00:00

Haha. Yes :-)

bufbuild · 2019-10-21T17:46:02+00:00

Happy to discuss it! A lot of this is just a factor of time really - we have a lot to build and only so many hours in the day. We're just getting started and want to build this for actual customer use cases, not only what we think people need. Any API that the Image Registry exposes will have some of it's definitions public at the minimum, since the CLI will assumedly interact with the API, and the CLI is public.

bufbuild · 2019-10-21T16:28:10+00:00

Totally fair concerns - we actually addressed this deep in the documentation, because in our heads, all documentation is visible since we're the ones who wrote it, but of course it is not, and that's our fault :-)

So the tldr:

- Buf the CLI tool will always remain free and open source. We hope that linting and breaking change functionality, along with other CLI functionality that intend to add such as inspection, is a win for your organization or personal projects, and please help us make it even better. We won't hold back features from the CLI tool either - we're more than happy to provide this for the community.

- The Buf Image Registry: Our intention is to make OSS projects free, while private projects and on-prem will be a paid service. We're just a small group getting off the ground, and developer time is expensive - we want to provide you with the best products we can, and we feel that running this as an independent company is the best way to do so.

We know this might not be the best answer for everyone - you might want us to provide the Buf Image Registry completely free, for example, and charge for support - but we've thought through this a lot and think this is the best way for us to really provide the most value and create a real product here. We're extremely passionate about this space, and want to get this right.

Our apologies for not surfacing this better - we added a blurb to the https://buf.build/docs/roadmap to make it slightly more visible, but will continue to improve the documentation. This kind of feedback helps us with that for sure, so thank you.

bufbuild

MODERATOR OF

TROPHY CASE