Apache Doris

Posted on Oct 17

JOIN the data analytics race: Apache Doris vs. ClickHouse, Databricks, and Snowflake

#clickhouse #dataengineering #olap #database

Responding to the ClickHouse Community's Invitation, Apache Doris Join the the Coffee Benchmark: Join me if you can: ClickHouse vs. Databricks & Snowflake - Part 1. In the coffee-shop-themed benchmark, we compared the performance of executing SQL queries with heavy joins on Databricks, Snowflake, ClickHouse, VeloDB Cloud (the cloud version of Apache Doris).

Without a doubt, across all data scales from 721 million to 7.2 billion rows, Apache Doris demonstrated the fastest and most cost-effective performance.

The ClickHouse community launched a "Join me if you can" challenge. As a key player in the data analytics field, Doris had to step up to showcase its performance in the latest iterative versions. Can we achieve the best cost-performance ratio with extreme performance? To find out, we used the same benchmark, loaded the data into VeloDB Cloud, used comparable instance sizes run the original 17 queries. Most queries we made no rewrites (except for minor syntax adjustments for Doris in some CREATE TABLE statements).

Next, we will walk you through how to run this benchmark step by step. We encourage you to try it yourself and will present detailed results for the three data scales (721 million, 1.4 billion, and 7.2 billion rows).

In summary: Across all data scales from 721 million to 7.2 billion rows, for the vast majority of queries, Doris is twice faster than ClickHouse under the same resources and pricing, 5 to 10 times faster than Databricks and Snowflake. Apache Doris demonstrated the fastest and most cost-effective performance.

How to Run the Benchmark

You can find all the necessary information, including the 17 queries, scripts, and documentation, in this GitHub repository: coffeeshop-benchmark. The dataset can be downloaded from the pre-generated dataset provided by the ClickHouse community, uploaded to your own S3 bucket. Or you can use our prepared private repository data. This allows you to skip the data generation step and start testing immediately.

The entire process is automated: start the VeloDB Cloud service, configure the S3 access key/secret key, and execute the script commands with your cluster specifications and per-compute-unit pricing. You can refer to the GitHub repository for the scripts: coffee-bench.

As with the original benchmark, we ran each query 5 times and selected the fastest result to reflect performance with cache hits. The benchmark uses simulated data from a national coffee chain's orders. It tests three data scales for the fact table:

Scale factor	Total rows in fact table (Sales)
500 million rows	721 million rows
1 billion rows	1.4 billion rows
5 billion rows	7.2 billion rows

The three scales use the same three-table structure:

Sales: Order fact table
Products: Product dimension table
Locations: Store/location dimension table

The benchmark includes 17 SQL queries Performance Test Queries, most of which involve joins between the fact table and one or two dimension tables, executed sequentially. The original benchmark's Part 1 covered the smaller 721 million row dataset, while Part 2 included results for 1.4 billion and 7.2 billion rows. Below, we present the complete charts without additional explanations. For detailed background, refer to the original ClickHouse post. We will conclude with a clear summary.

In the charts below, VeloDB Cloud results are labeled as follows: Doris 4n_16c_128g, where:

4n = Number of compute nodes
16c = CPU cores per node
128g = Memory per node (GB)

All service configurations use 16 cores and 128 GB memory per node, with tests conducted on two node scales: 4 and 30 nodes.

The labels for Databricks (e.g: DBX_S) and Snowflake (e.g: SF_S_Gen2) remain unchanged, with detailed explanations in the original post.

Results：

500M Scale (721 Million Rows)

This scale generates 721 million rows from 500 million orders.

Total Cost

Each bar shows the total cost for all 17 queries in parentheses.

Total Runtime

Each bar shows the total runtime for all 17 queries in parentheses.

Average Cost per Query (Excluding Q10 and Q16)

Average Runtime per Query (Excluding Q10 and Q16)

Average Cost for Q10 and Q16

Average Runtime for Q10 and Q16

1B Scale (1.4 Billion Rows)

This scale generates 1.4 billion rows from 1 billion orders.

Total Cost

Each bar shows the total cost for all 17 queries in parentheses.

Total Runtime

Each bar shows the total runtime for all 17 queries in parentheses.

Average Cost per Query (Excluding Q10 and Q16)

Average Runtime per Query (Excluding Q10 and Q16)

Average Cost for Q10 and Q16

Average Runtime for Q10 and Q16

5B Scale (7.2 Billion Rows)

This scale generates 7.2 billion rows from 5 billion orders.

Total Cost

Each bar shows the total cost for all 17 queries in parentheses.

Total Runtime

Each bar shows the total runtime for all 17 queries in parentheses.

Average Cost per Query (Excluding Q10 and Q16)

Average Runtime per Query (Excluding Q10 and Q16)

Average Cost for Q10 and Q16

Average Runtime for Q10 and Q16

Summary

The results are not surprising: Doris delivers outstanding performance in join scenarios, achieving remarkable speed across all data scales. The benchmark included 17 queries involving joins across 2–3 tables. Since this is a CPU-intensive test, we only adjusted the query parallelism from the default of 8 to 16. We observed Doris's performance under full CPU utilization:

At the 500M scale, most queries completed within 1 second. The total query runtime was twice as fast as ClickHouse and consistently 6–10 times faster than other solutions，Doris using only half its CPU resources.
At the 1B scale, Doris completed joins, aggregations, sorting on 1.7 billion rows in seconds. With the same resource usage, Doris maintained its performance lead over all competitors like the 500M scale.
At the 5B scale, even the heaviest queries completed in seconds rather than minutes. Although Q16 significantly slowed the overall query runtime (requiring further optimization for data skew excluding Q16, Doris still performed twice as fast as ClickHouse), Doris still is the fastest and most cost-effective choice.

We are excited to participate in the open-source community's performance race, which is both thrilling and challenging. In real-world user scenarios, this benchmark with 2–3 table joins is relatively simple. For more complex queries closer to real-world user's cases, such as TPC-H or TPC-DS, Doris continues to deliver efficient performance. We plan to conduct further performance comparisons to showcase our query optimization efforts and results.