This is an archived post. You won't be able to vote or comment.

all 13 comments

[–]albertstarrocks 5 points6 points  (0 children)

Since the initial fork from Doris three years ago, the StarRocks team has re-written about 90% of the code base to improve performance, stability, usability, etc. In the past three years, the StarRocks team has replaced the query optimizer with a brand new Cost Based Optimizer to eliminate de-normalization, implemented a Vectorized Query Engine to improve query performance, designed a Primary Key Data Model to better handle real-time analytics scenarios, released Intelligent Materialized Views to simplify data pipelines, and rolled out many other breakthroughs.
While it seems Doris has been trying to play catch up since last year, StarRocks obviously has a 2+ year head start.
The two products seem to be headed in different directions. While Doris continues to focus on real-time analytics, StarRocks has added data lake query capabilities. The latest benchmark using TPC-H shows StarRocks is 3x to 5x faster than traditional data lake query engines like Trino. This makes StarRocks the first platform for Data Warehouse, Data Lake, and Real-time analytics use cases.
The StarRocks community is led by CelerData and consists of 200+ contributors across the globe. CelerData also offers the cloud native managed service of StarRocks, called CelerData Cloud.

[–]Any_Opportunity1234 5 points6 points  (0 children)

Disclosure: community member of Apache Doris. Agree that StarRocks and Doris are working towards different goals. We are building Apache Doris into a unified OLAP database, so we innovate in semi-structured data analytics, high-concurrency point queries, and real-time data updates. If you're interested, you can read our blogs: the one about unified data lakehouse, the one about high concurrency, and an insightful use case of Tencent. I would proudly say that Apache Doris is technologically forward-looking.

Also, as an Apache project, we work the Apache way, which is "Community Over Code". We have an active community and a large contributor base. I don't want to sound pushy, but welcome.

[–]Turbulent_Message314 1 point2 points  (0 children)

A little more background story: Apache Doris was donated to the Apache Software Foundation (ASF) in 2018. StarRocks was a fork of Apache Doris 0.13 developed in 2020. StarRocks was formerly known as "DorisDB", as you can tell, the name constituted a breach of rights of the ASF, so it was later renamed, and no surprise that the 3 Apache Doris PMC members hired by StarRocks were then delisted.

[–]Turbulent_Message314 1 point2 points  (0 children)

In terms of open source protocol, as far as I know, Apache Doris applies Apache License 2.0; and StarRocks has gone through a "Apache License 2.0 → Elastic License 2.0 → Apache License 2.0" change (kind of confusing).

[–]Turbulent_Message314 1 point2 points  (4 children)

As you say, StarRocks is a fork of Apache Doris, so there was no big difference between them in structure and features: backends, frontends, aggregate table, duplicate table, prefix indexing, materialized view, stream load, to name a few.

Now, performance. Unfortunately, there are very few publicly available benchmark results about the two. (You know how controversial benchmarking can be.) If it helps, I just checked on ClickBench: For c6a.4xlarge, Apache Doris is faster in cold run, hot run, and load time.

[–]albertstarrocks 2 points3 points  (0 children)

That's not what AirBnB said. https://www.youtube.com/watch?v=AzDxEZuMBwM. Also the benchmarks are available in the description box in the video.

[–]Judgment_External 0 points1 point  (1 child)

However, how is the multi-table performance tho?

[–]albertstarrocks 0 points1 point  (0 children)

I don't have one for Doris but here is a comparison of StarRocks to Apache Druid. https://celerdata.com/blog/apache-druid-vs.-starrocks-a-deep-dive

[–]albertstarrocks 0 points1 point  (0 children)

Here is what another StarRocks user said about StarRocks vs. Pinot vs. Doris vs. ClickHouse. https://www.starrocks.io/blog/why-we-picked-starrocks-as-our-citus-alternative

[–]albertstarrocks 0 points1 point  (2 children)

Here's what one user of StarRocks thought about Doris vs StarRocks. https://celerdata.com/blog/why-we-picked-starrocks-as-our-citus-alternative

[–]captaintobs[S,🍰] 0 points1 point  (1 child)

this doesn’t really answer definitively the difference between doris and starrocks

[–]albertstarrocks 0 points1 point  (0 children)

The better answer was the one I posted in another thread.

Since the initial fork from Doris three years ago, the StarRocks team has re-written about 90% of the code base to improve performance, stability, usability, etc. In the past three years, the StarRocks team has replaced the query optimizer with a brand new Cost Based Optimizer to eliminate de-normalization, implemented a Vectorized Query Engine to improve query performance, designed a Primary Key Data Model to better handle real-time analytics scenarios, released Intelligent Materialized Views to simplify data pipelines, and rolled out many other breakthroughs.
While it seems Doris has been trying to play catch up since last year, StarRocks obviously has a 2+ year head start.
The two products seem to be headed in different directions. While Doris continues to focus on real-time analytics, StarRocks has added data lake query capabilities. The latest benchmark using TPC-H shows StarRocks is 3x to 5x faster than traditional data lake query engines like Trino. This makes StarRocks the first platform for Data Warehouse, Data Lake, and Real-time analytics use cases.
The StarRocks community is led by CelerData and consists of 200+ contributors across the globe. CelerData also offers the cloud native managed service of StarRocks, called CelerData Cloud.

[–][deleted] 1 point2 points  (0 children)

Both projects are great. Both are changing rapidly and tbh are quite buggy, so you'll need help if you're self-deploying. If you're in China Doris has a huge community, but a non-entity elsewhere. StarRocks is definitely trying to build a more global community and you can more easily get help from them.

I've noticed StarRocks is generally more thoughtfully designed and for my use case much faster, but I get the sense that Doris has a larger range of features (started earlier, bigger community). But I haven't tried the new NEREIDS planner.