DEV Community

dcopensource
dcopensource

Posted on

Performance comparison of TiDB For PostgreSQL and YugabyteDB on Sysbench

Background

PostgreSQL is a well-known open source database product. As a digital infrastructure, it plays an important role in large-scale enterprise applications. Many of its advanced features are widely used in various complex scenarios.

According to the Stack Overflow 2022 Developer Survey Report, PostgreSQL has surpassed MySQL as the favorite database software for developers.

Image description

Distributed databases are the current development trend in the database field in recent years. Distributed databases based on the PostgreSQL protocol have produced excellent products such as CockroachDB and YugabyteDB, while the domestic popular distributed database TiDB only provides the MySQL protocol. Based on the open source TiDB, the Digital China technical team has developed a version of TiDB For PostgreSQL, which is compatible with mainstream applications. References:

They both belong to the NewSQL category and implement the PG protocol, but there are still differences in the specific architecture implementation. We are very curious about how the performance of the two compares.

Test program

Prepare 3 physical machines to build a distributed database, respectively build a 3-node YugabyteDB cluster and a TiDB For PostgreSQL cluster, all of which are deployed with default parameters, and Haproxy is used as a load balancing proxy on the database. This can ensure that the deployment architecture of the two is as consistent as possible and reduce experimental errors.

Deployment environment:

Machine name Parameters Remark
database node-1 physical machine,NUC 10 x86,12c 64G 1Tssd yugabytedb:1 master、1 tserver tidb4pg:1 pd、1 tidb、1 tikv
database node-2 physical machine,NUC 10 x86,12c 64G 1Tssd Ditto
database node-3 physical machine,NUC 10 x86,12c 64G 1Tssd Ditto
Sysbench virtual machine,x86,16c 32G
Haproxy virtual machine,x86,8c 16G

Sysbench stress test selects 7 common scenarios:

  • select_random_points (random check)
  • oltp_read_only (read-only transaction)
  • oltp_write_only (write-only transaction)
  • oltp_read_write (read-write mixed transaction)
  • oltp_update_index (with index update)
  • oltp_update_non_index (without index update)
  • oltp_delete (transaction delete)

For each of the above scenarios, set 5 different concurrency levels, namely 10, 50, 100, 200, and 300 threads, and continue the stress test for 5 minutes.

Collect the QPS and 95% latency indicators in each test data to compare the two databases.

Test Results

1、select_random_points

Threads QPS(TiDB4PG) 95% Latency(TiDB4PG) QPS(YugabyteDB) 95% Latency(YugabyteDB)
10 4044 3.49 2666 5.47
50 8398 10.09 5086 15.00
100 8754 20 5429 29.19
200 9441 36.89 5541 58.92
300 9785 52.89 5531 92.42

Image description

Result analysis:

In the enumeration scenario, TIDB For PostgreSQL is slightly ahead of YugabyteDB, and as the amount of concurrency increases, the gap is getting bigger and bigger.

2、oltp_read_only

Threads QPS(TiDB4PG) 95% Latency(TiDB4PG) QPS(YugabyteDB) 95% Latency(YugabyteDB)
10 8954 20.74 0.13 -
50 24262 38.94 0.15 -
100 25136 78.60 0.15 -
200 25267 158.63 0.14 -
300 25436 240.02 0.14 -

Result analysis:

The test results of this scenario are very different. The query performance of YugabyteDB is crushed, and TiDB For PostgreSQL is at a normal level.

By analyzing the Sysbench source code, it is found that the read-only scenario mainly includes the following query SQL:

SELECT c FROM sbtest1 WHERE id BETWEEN 1 AND 1000;
SELECT sum(k) FROM sbtest1 WHERE id BETWEEN 1 AND 1000;
SELECT c FROM sbtest1 WHERE id BETWEEN 1 AND 1000 order by c;
SELECT distinct c FROM sbtest1 WHERE id BETWEEN 1 AND 1000 order by c;
Enter fullscreen mode Exit fullscreen mode

The execution of the 4 SQLs for range checking based on the primary key takes more than 400 seconds in total, and the result is very surprising. Take the first SQL as an example to analyze its execution plan:

Seq Scan on sbtest1  (cost=0.00..105.00 rows=1000 width=484) (actual time=9.968..127393.155 rows=1000 loops=1)
  Filter: ((id >= 1) AND (id <= 1000))
  Rows Removed by Filter: 9999000
Planning Time: 0.048 ms
Execution Time: 127393.919 ms
Enter fullscreen mode Exit fullscreen mode

According to past experience, the primary key index scan should be used directly, but in fact, a full table scan is used, which is confusing.

In order to verify whether the size of the query range is affected, I limited the query range of id to 1 to 10, and found that the execution plan is still the same. If the in query is used instead, the primary key index can be used.

In contrast, the execution plan of TiDB For PostgreSQL is expected:

Projection_4    500.09  1000    root    
└─IndexLookUp_13    500.09  1000    root    
  ├─IndexRangeScan_11(Build)    500.09  1000    cop[tikv]   table:sbtest1, index:PRIMARY(id)
  └─TableRowIDScan_12(Probe)    500.09  1000    cop[tikv]   table:sbtest1
Enter fullscreen mode Exit fullscreen mode

The reason why YugabyteDB's range query efficiency is so low requires further analysis.

3、oltp_write_only

Threads QPS(TiDB4PG) 95% Latency(TiDB4PG) QPS(YugabyteDB) 95% Latency(YugabyteDB)
10 790 90.78 4309 47.47
50 3540 104.84 8427 71.83
100 6486 114.72 8674 92.42
200 11414 132.49 11170 150.29
300 15133 155.80 12036 211.60

Image description

Result analysis:

There are 4 types of mixed writing scenarios, namely, index update, without index update, deletion, and insertion mentioned later. YugabyteDB has obvious performance advantages when the concurrency is small, but TiDB For PostgreSQL starts from 200 concurrency. Performance began to overtake, which can reflect slightly better stability.

4、oltp_read_write

Threads QPS(TiDB4PG) 95% Latency(TiDB4PG) QPS(YugabyteDB) 95% Latency(YugabyteDB)
10 2238 104.84 556 3151.62
50 9790 123.28 1485 3326
100 17454 137.35 1494 6960
200 25119 189.93 1402 14302
300 25967 272.27 1835 22034

Result analysis:

The mixed read-write scenario is also affected by the YugabyteDB range query, and the overall performance is much worse than that of TiDB For PostgreSQL, reaching a bottleneck at 50 concurrency.

5、oltp_update_index

Threads QPS(TiDB4PG) 95% Latency(TiDB4PG) QPS(YugabyteDB) 95% Latency(YugabyteDB)
10 374 36.24 2227 11.65
50 1630 41.85 3754 21.89
100 2980 45.79 4266 36.24
200 5375 50.11 4845 62.19
300 7298 55.82 4964 92.42

Image description

Result analysis:

update_index is to update the index column according to the primary key. In this scenario, YugabyteDB has better performance when the concurrency is small, but as the concurrency increases, it will reach the performance bottleneck faster, and it will get worse and worse. TiDB For PostgreSQL is relatively stable.

6、oltp_update_non_index

Threads QPS(TiDB4PG) 95% Latency(TiDB4PG) QPS(YugabyteDB) 95% Latency(YugabyteDB)
10 415 31.37 6847 2.03
50 1988 35.59 17123 4.65
100 3761 37.56 16246 9.91
200 6910 41.10 18928 20.37
300 9849 43.39 19896 29.72

Image description

Result analysis:

update_non_index is to update the non-index column according to the primary key. In this scenario, YugabyteDB leads TiDB For PostgreSQL by a large margin and continues to maintain it.

7、oltp_delete

Threads QPS(TiDB4PG) 95% Latency(TiDB4PG) QPS(YugabyteDB) 95% Latency(YugabyteDB)
10 364 37.56 2389 10.84
50 1568 43.39 4269 19.65
100 2899 46.63 4844 33.12
200 5380 50.11 5705 56.84
300 7622 54.83 6146 82.96

Image description

Result analysis:

The delete scenario is to delete a row of records based on the primary key. The test results are similar to non-indexed updates. YugabyteDB performed better at the beginning, but gradually fell behind TiDB For PostgreSQL after the concurrency increased.

Summarize

After the tests of the above 7 scenarios, it is found that the two databases have their own advantages and disadvantages in performance.

TiDB For PostgreSQL is better at read operations, and YugabyteDB is better at write operations, but as the database load increases, YugabyteDB's processing capability and stability for large concurrent requests are not as good as TiDB For PostgreSQL.

We will make an in-depth comparison of the architecture and underlying principles of the two in the follow-up to see if we can find the real reason.

Top comments (1)

Collapse
 
franckpachot profile image
Franck Pachot

Your execution plan in Yugabyte shows a Seq Scan. The reason is that YugabyteDB can shard on Hash and Range, and Hash on the first column of the primary key is the default for better distribution. But not correct for this workload. You need to define the primary key as primary key(id asc). You don't have the problem in TiDB because they have only range sharding.

Here are the modified version of Sysbench to make it aware of the distributed nature of YugabyteDB:
github.com/akopytov/sysbench/compa...
The most important is range sharding, and connecting to all nodes