dcopensource

Posted on Jul 6, 2022

Performance comparison of TiDB For PostgreSQL and YugabyteDB on Sysbench

#postgres #yugabytedb #tidb #sysbench

Background

PostgreSQL is a well-known open source database product. As a digital infrastructure, it plays an important role in large-scale enterprise applications. Many of its advanced features are widely used in various complex scenarios.

According to the Stack Overflow 2022 Developer Survey Report, PostgreSQL has surpassed MySQL as the favorite database software for developers.

Distributed databases are the current development trend in the database field in recent years. Distributed databases based on the PostgreSQL protocol have produced excellent products such as CockroachDB and YugabyteDB, while the domestic popular distributed database TiDB only provides the MySQL protocol. Based on the open source TiDB, the Digital China technical team has developed a version of TiDB For PostgreSQL, which is compatible with mainstream applications. References:

They both belong to the NewSQL category and implement the PG protocol, but there are still differences in the specific architecture implementation. We are very curious about how the performance of the two compares.

Test program

Prepare 3 physical machines to build a distributed database, respectively build a 3-node YugabyteDB cluster and a TiDB For PostgreSQL cluster, all of which are deployed with default parameters, and Haproxy is used as a load balancing proxy on the database. This can ensure that the deployment architecture of the two is as consistent as possible and reduce experimental errors.

Deployment environment:

Machine name	Parameters	Remark
database node-1	physical machine，NUC 10 x86，12c 64G 1Tssd	yugabytedb：1 master、1 tserver tidb4pg：1 pd、1 tidb、1 tikv
database node-2	physical machine，NUC 10 x86，12c 64G 1Tssd	Ditto
database node-3	physical machine，NUC 10 x86，12c 64G 1Tssd	Ditto
Sysbench	virtual machine，x86，16c 32G
Haproxy	virtual machine，x86，8c 16G

Sysbench stress test selects 7 common scenarios:

select_random_points (random check)
oltp_read_only (read-only transaction)
oltp_write_only (write-only transaction)
oltp_read_write (read-write mixed transaction)
oltp_update_index (with index update)
oltp_update_non_index (without index update)
oltp_delete (transaction delete)

For each of the above scenarios, set 5 different concurrency levels, namely 10, 50, 100, 200, and 300 threads, and continue the stress test for 5 minutes.

Collect the QPS and 95% latency indicators in each test data to compare the two databases.

Test Results

1、select_random_points

Threads	QPS（TiDB4PG）	95% Latency（TiDB4PG）	QPS（YugabyteDB）	95% Latency（YugabyteDB）
10	4044	3.49	2666	5.47
50	8398	10.09	5086	15.00
100	8754	20	5429	29.19
200	9441	36.89	5541	58.92
300	9785	52.89	5531	92.42

Result analysis:

In the enumeration scenario, TIDB For PostgreSQL is slightly ahead of YugabyteDB, and as the amount of concurrency increases, the gap is getting bigger and bigger.

2、oltp_read_only

Threads	QPS（TiDB4PG）	95% Latency（TiDB4PG）	QPS（YugabyteDB）	95% Latency（YugabyteDB）
10	8954	20.74	0.13	-
50	24262	38.94	0.15	-
100	25136	78.60	0.15	-
200	25267	158.63	0.14	-
300	25436	240.02	0.14	-

Result analysis:

The test results of this scenario are very different. The query performance of YugabyteDB is crushed, and TiDB For PostgreSQL is at a normal level.

By analyzing the Sysbench source code, it is found that the read-only scenario mainly includes the following query SQL:



SELECT c FROM sbtest1 WHERE id BETWEEN 1 AND 1000;
SELECT sum(k) FROM sbtest1 WHERE id BETWEEN 1 AND 1000;
SELECT c FROM sbtest1 WHERE id BETWEEN 1 AND 1000 order by c;
SELECT distinct c FROM sbtest1 WHERE id BETWEEN 1 AND 1000 order by c;

The execution of the 4 SQLs for range checking based on the primary key takes more than 400 seconds in total, and the result is very surprising. Take the first SQL as an example to analyze its execution plan:



Seq Scan on sbtest1  (cost=0.00..105.00 rows=1000 width=484) (actual time=9.968..127393.155 rows=1000 loops=1)
  Filter: ((id >= 1) AND (id <= 1000))
  Rows Removed by Filter: 9999000
Planning Time: 0.048 ms
Execution Time: 127393.919 ms

According to past experience, the primary key index scan should be used directly, but in fact, a full table scan is used, which is confusing.

In order to verify whether the size of the query range is affected, I limited the query range of id to 1 to 10, and found that the execution plan is still the same. If the in query is used instead, the primary key index can be used.

In contrast, the execution plan of TiDB For PostgreSQL is expected:



Projection_4    500.09  1000    root    
└─IndexLookUp_13    500.09  1000    root    
  ├─IndexRangeScan_11(Build)    500.09  1000    cop[tikv]   table:sbtest1, index:PRIMARY(id)
  └─TableRowIDScan_12(Probe)    500.09  1000    cop[tikv]   table:sbtest1

The reason why YugabyteDB's range query efficiency is so low requires further analysis.

3、oltp_write_only

Threads	QPS（TiDB4PG）	95% Latency（TiDB4PG）	QPS（YugabyteDB）	95% Latency（YugabyteDB）
10	790	90.78	4309	47.47
50	3540	104.84	8427	71.83
100	6486	114.72	8674	92.42
200	11414	132.49	11170	150.29
300	15133	155.80	12036	211.60

Result analysis:

There are 4 types of mixed writing scenarios, namely, index update, without index update, deletion, and insertion mentioned later. YugabyteDB has obvious performance advantages when the concurrency is small, but TiDB For PostgreSQL starts from 200 concurrency. Performance began to overtake, which can reflect slightly better stability.

4、oltp_read_write

Threads	QPS（TiDB4PG）	95% Latency（TiDB4PG）	QPS（YugabyteDB）	95% Latency（YugabyteDB）
10	2238	104.84	556	3151.62
50	9790	123.28	1485	3326
100	17454	137.35	1494	6960
200	25119	189.93	1402	14302
300	25967	272.27	1835	22034

Result analysis:

The mixed read-write scenario is also affected by the YugabyteDB range query, and the overall performance is much worse than that of TiDB For PostgreSQL, reaching a bottleneck at 50 concurrency.

5、oltp_update_index

Threads	QPS（TiDB4PG）	95% Latency（TiDB4PG）	QPS（YugabyteDB）	95% Latency（YugabyteDB）
10	374	36.24	2227	11.65
50	1630	41.85	3754	21.89
100	2980	45.79	4266	36.24
200	5375	50.11	4845	62.19
300	7298	55.82	4964	92.42

Result analysis:

update_index is to update the index column according to the primary key. In this scenario, YugabyteDB has better performance when the concurrency is small, but as the concurrency increases, it will reach the performance bottleneck faster, and it will get worse and worse. TiDB For PostgreSQL is relatively stable.

6、oltp_update_non_index

Threads	QPS（TiDB4PG）	95% Latency（TiDB4PG）	QPS（YugabyteDB）	95% Latency（YugabyteDB）
10	415	31.37	6847	2.03
50	1988	35.59	17123	4.65
100	3761	37.56	16246	9.91
200	6910	41.10	18928	20.37
300	9849	43.39	19896	29.72

Result analysis:

update_non_index is to update the non-index column according to the primary key. In this scenario, YugabyteDB leads TiDB For PostgreSQL by a large margin and continues to maintain it.

7、oltp_delete

Threads	QPS（TiDB4PG）	95% Latency（TiDB4PG）	QPS（YugabyteDB）	95% Latency（YugabyteDB）
10	364	37.56	2389	10.84
50	1568	43.39	4269	19.65
100	2899	46.63	4844	33.12
200	5380	50.11	5705	56.84
300	7622	54.83	6146	82.96

Result analysis:

The delete scenario is to delete a row of records based on the primary key. The test results are similar to non-indexed updates. YugabyteDB performed better at the beginning, but gradually fell behind TiDB For PostgreSQL after the concurrency increased.

Summarize

After the tests of the above 7 scenarios, it is found that the two databases have their own advantages and disadvantages in performance.

TiDB For PostgreSQL is better at read operations, and YugabyteDB is better at write operations, but as the database load increases, YugabyteDB's processing capability and stability for large concurrent requests are not as good as TiDB For PostgreSQL.

We will make an in-depth comparison of the architecture and underlying principles of the two in the follow-up to see if we can find the real reason.

Top comments (1)

Franck Pachot • Dec 26 '22

Your execution plan in Yugabyte shows a Seq Scan. The reason is that YugabyteDB can shard on Hash and Range, and Hash on the first column of the primary key is the default for better distribution. But not correct for this workload. You need to define the primary key as primary key(id asc). You don't have the problem in TiDB because they have only range sharding.

Here are the modified version of Sysbench to make it aware of the distributed nature of YugabyteDB:
github.com/akopytov/sysbench/compa...
The most important is range sharding, and connecting to all nodes

DEV Community

Performance comparison of TiDB For PostgreSQL and YugabyteDB on Sysbench

Background

Test program

Test Results

1、select_random_points

2、oltp_read_only

3、oltp_write_only

4、oltp_read_write

5、oltp_update_index

6、oltp_update_non_index

7、oltp_delete

Summarize

Top comments (1)

Read next

SVG hacks for you

Integrate Daytonia in a Url Shortener App

Install Splunk on Windows in Minutes: Complete Guide for Beginners

Embracing Disappearing Frameworks: A Future of Efficient Web Development