Background
PostgreSQL is a well-known open source database product. As a digital infrastructure, it plays an important role in large-scale enterprise applications. Many of its advanced features are widely used in various complex scenarios.
According to the Stack Overflow 2022 Developer Survey Report, PostgreSQL has surpassed MySQL as the favorite database software for developers.
Distributed databases are the current development trend in the database field in recent years. Distributed databases based on the PostgreSQL protocol have produced excellent products such as CockroachDB and YugabyteDB, while the domestic popular distributed database TiDB only provides the MySQL protocol. Based on the open source TiDB, the Digital China technical team has developed a version of TiDB For PostgreSQL, which is compatible with mainstream applications. References:
- TiDB for PostgreSQL - a small test
- TiDB for PostgreSQL Study Guide
- TiDB for PostgreSQL compatible with GitLab
They both belong to the NewSQL category and implement the PG protocol, but there are still differences in the specific architecture implementation. We are very curious about how the performance of the two compares.
Test program
Prepare 3 physical machines to build a distributed database, respectively build a 3-node YugabyteDB cluster and a TiDB For PostgreSQL cluster, all of which are deployed with default parameters, and Haproxy is used as a load balancing proxy on the database. This can ensure that the deployment architecture of the two is as consistent as possible and reduce experimental errors.
Deployment environment:
Machine name | Parameters | Remark |
---|---|---|
database node-1 | physical machine,NUC 10 x86,12c 64G 1Tssd | yugabytedb:1 master、1 tserver tidb4pg:1 pd、1 tidb、1 tikv |
database node-2 | physical machine,NUC 10 x86,12c 64G 1Tssd | Ditto |
database node-3 | physical machine,NUC 10 x86,12c 64G 1Tssd | Ditto |
Sysbench | virtual machine,x86,16c 32G | |
Haproxy | virtual machine,x86,8c 16G |
Sysbench stress test selects 7 common scenarios:
- select_random_points (random check)
- oltp_read_only (read-only transaction)
- oltp_write_only (write-only transaction)
- oltp_read_write (read-write mixed transaction)
- oltp_update_index (with index update)
- oltp_update_non_index (without index update)
- oltp_delete (transaction delete)
For each of the above scenarios, set 5 different concurrency levels, namely 10, 50, 100, 200, and 300 threads, and continue the stress test for 5 minutes.
Collect the QPS and 95% latency indicators in each test data to compare the two databases.
Test Results
1、select_random_points
Threads | QPS(TiDB4PG) | 95% Latency(TiDB4PG) | QPS(YugabyteDB) | 95% Latency(YugabyteDB) |
---|---|---|---|---|
10 | 4044 | 3.49 | 2666 | 5.47 |
50 | 8398 | 10.09 | 5086 | 15.00 |
100 | 8754 | 20 | 5429 | 29.19 |
200 | 9441 | 36.89 | 5541 | 58.92 |
300 | 9785 | 52.89 | 5531 | 92.42 |
Result analysis:
In the enumeration scenario, TIDB For PostgreSQL is slightly ahead of YugabyteDB, and as the amount of concurrency increases, the gap is getting bigger and bigger.
2、oltp_read_only
Threads | QPS(TiDB4PG) | 95% Latency(TiDB4PG) | QPS(YugabyteDB) | 95% Latency(YugabyteDB) |
---|---|---|---|---|
10 | 8954 | 20.74 | 0.13 | - |
50 | 24262 | 38.94 | 0.15 | - |
100 | 25136 | 78.60 | 0.15 | - |
200 | 25267 | 158.63 | 0.14 | - |
300 | 25436 | 240.02 | 0.14 | - |
Result analysis:
The test results of this scenario are very different. The query performance of YugabyteDB is crushed, and TiDB For PostgreSQL is at a normal level.
By analyzing the Sysbench source code, it is found that the read-only scenario mainly includes the following query SQL:
SELECT c FROM sbtest1 WHERE id BETWEEN 1 AND 1000;
SELECT sum(k) FROM sbtest1 WHERE id BETWEEN 1 AND 1000;
SELECT c FROM sbtest1 WHERE id BETWEEN 1 AND 1000 order by c;
SELECT distinct c FROM sbtest1 WHERE id BETWEEN 1 AND 1000 order by c;
The execution of the 4 SQLs for range checking based on the primary key takes more than 400 seconds in total, and the result is very surprising. Take the first SQL as an example to analyze its execution plan:
Seq Scan on sbtest1 (cost=0.00..105.00 rows=1000 width=484) (actual time=9.968..127393.155 rows=1000 loops=1)
Filter: ((id >= 1) AND (id <= 1000))
Rows Removed by Filter: 9999000
Planning Time: 0.048 ms
Execution Time: 127393.919 ms
According to past experience, the primary key index scan should be used directly, but in fact, a full table scan is used, which is confusing.
In order to verify whether the size of the query range is affected, I limited the query range of id to 1 to 10, and found that the execution plan is still the same. If the in query is used instead, the primary key index can be used.
In contrast, the execution plan of TiDB For PostgreSQL is expected:
Projection_4 500.09 1000 root
└─IndexLookUp_13 500.09 1000 root
├─IndexRangeScan_11(Build) 500.09 1000 cop[tikv] table:sbtest1, index:PRIMARY(id)
└─TableRowIDScan_12(Probe) 500.09 1000 cop[tikv] table:sbtest1
The reason why YugabyteDB's range query efficiency is so low requires further analysis.
3、oltp_write_only
Threads | QPS(TiDB4PG) | 95% Latency(TiDB4PG) | QPS(YugabyteDB) | 95% Latency(YugabyteDB) |
---|---|---|---|---|
10 | 790 | 90.78 | 4309 | 47.47 |
50 | 3540 | 104.84 | 8427 | 71.83 |
100 | 6486 | 114.72 | 8674 | 92.42 |
200 | 11414 | 132.49 | 11170 | 150.29 |
300 | 15133 | 155.80 | 12036 | 211.60 |
Result analysis:
There are 4 types of mixed writing scenarios, namely, index update, without index update, deletion, and insertion mentioned later. YugabyteDB has obvious performance advantages when the concurrency is small, but TiDB For PostgreSQL starts from 200 concurrency. Performance began to overtake, which can reflect slightly better stability.
4、oltp_read_write
Threads | QPS(TiDB4PG) | 95% Latency(TiDB4PG) | QPS(YugabyteDB) | 95% Latency(YugabyteDB) |
---|---|---|---|---|
10 | 2238 | 104.84 | 556 | 3151.62 |
50 | 9790 | 123.28 | 1485 | 3326 |
100 | 17454 | 137.35 | 1494 | 6960 |
200 | 25119 | 189.93 | 1402 | 14302 |
300 | 25967 | 272.27 | 1835 | 22034 |
Result analysis:
The mixed read-write scenario is also affected by the YugabyteDB range query, and the overall performance is much worse than that of TiDB For PostgreSQL, reaching a bottleneck at 50 concurrency.
5、oltp_update_index
Threads | QPS(TiDB4PG) | 95% Latency(TiDB4PG) | QPS(YugabyteDB) | 95% Latency(YugabyteDB) |
---|---|---|---|---|
10 | 374 | 36.24 | 2227 | 11.65 |
50 | 1630 | 41.85 | 3754 | 21.89 |
100 | 2980 | 45.79 | 4266 | 36.24 |
200 | 5375 | 50.11 | 4845 | 62.19 |
300 | 7298 | 55.82 | 4964 | 92.42 |
Result analysis:
update_index is to update the index column according to the primary key. In this scenario, YugabyteDB has better performance when the concurrency is small, but as the concurrency increases, it will reach the performance bottleneck faster, and it will get worse and worse. TiDB For PostgreSQL is relatively stable.
6、oltp_update_non_index
Threads | QPS(TiDB4PG) | 95% Latency(TiDB4PG) | QPS(YugabyteDB) | 95% Latency(YugabyteDB) |
---|---|---|---|---|
10 | 415 | 31.37 | 6847 | 2.03 |
50 | 1988 | 35.59 | 17123 | 4.65 |
100 | 3761 | 37.56 | 16246 | 9.91 |
200 | 6910 | 41.10 | 18928 | 20.37 |
300 | 9849 | 43.39 | 19896 | 29.72 |
Result analysis:
update_non_index is to update the non-index column according to the primary key. In this scenario, YugabyteDB leads TiDB For PostgreSQL by a large margin and continues to maintain it.
7、oltp_delete
Threads | QPS(TiDB4PG) | 95% Latency(TiDB4PG) | QPS(YugabyteDB) | 95% Latency(YugabyteDB) |
---|---|---|---|---|
10 | 364 | 37.56 | 2389 | 10.84 |
50 | 1568 | 43.39 | 4269 | 19.65 |
100 | 2899 | 46.63 | 4844 | 33.12 |
200 | 5380 | 50.11 | 5705 | 56.84 |
300 | 7622 | 54.83 | 6146 | 82.96 |
Result analysis:
The delete scenario is to delete a row of records based on the primary key. The test results are similar to non-indexed updates. YugabyteDB performed better at the beginning, but gradually fell behind TiDB For PostgreSQL after the concurrency increased.
Summarize
After the tests of the above 7 scenarios, it is found that the two databases have their own advantages and disadvantages in performance.
TiDB For PostgreSQL is better at read operations, and YugabyteDB is better at write operations, but as the database load increases, YugabyteDB's processing capability and stability for large concurrent requests are not as good as TiDB For PostgreSQL.
We will make an in-depth comparison of the architecture and underlying principles of the two in the follow-up to see if we can find the real reason.
Top comments (1)
Your execution plan in Yugabyte shows a
Seq Scan
. The reason is that YugabyteDB can shard on Hash and Range, and Hash on the first column of the primary key is the default for better distribution. But not correct for this workload. You need to define the primary key asprimary key(id asc)
. You don't have the problem in TiDB because they have only range sharding.Here are the modified version of Sysbench to make it aware of the distributed nature of YugabyteDB:
github.com/akopytov/sysbench/compa...
The most important is range sharding, and connecting to all nodes