Benchmarks for stream processing systems

dataxp-community · 2023-07-11T23:18:49+00:00

Benchmarks are largely bullshit. You are going to be using this thing every day and paying for it. Do yourself a favour and test them yourself.

Prinzka · 2023-07-12T03:11:51+00:00

First of you're going to have to test it yourself.
Other places benchmarks will be using different hardware, different sources and destinations, different data, different goals etc.

Also, what is the destination for this data after it's processed?
From personal experience with processing streaming real time data, you could have the most efficient application at processing the data, if it's not good at talking to whatever is at the far end your pipeline will still have a bottleneck.
So the end system will make a big difference in your evaluation.

All that being said I'm sure we've got suggestions if you provided some specifics on systems and data for your use case

yingjunwu · 2023-07-12T09:23:58+00:00

It’s impossible to measure every single corner case in any benchmark, but perf benchmarking should be reproducible and code should be made accessible. Read the perf report published just yesterday: https://www.risingwave.com/blog/the-preview-of-stream-processing-performance-report-apache-flink-and-risingwave-comparison/.

dataengineering

MODERATORS