Our performance tests are based on the TPC-C Benchmark that is commonly used both by industry and academia.
Ours tests used BenchmarkSQL (see MOT Sample TPC-C Benchmark) and generates the workload using interactive SQL commands, as opposed to stored procedures.
All tests that evaluated the performance of openGauss MOT vs DISK used synchronous logging and its optimized group-commit=on version in MOT.
Finally, we performed an additional test in order to evaluate MOT's ability to quickly and ingest massive quantities of data and to serve as an alternative to a mid-tier data ingestion solutions.
All tests were performed in June 2020.
The following shows various types of MOT performance benchmarks –
· MOT Hardware
· MOT Results – Summary
· MOT High Throughput
· MOT Low Latency
· MOT RTO and Cold-Start Time
· MOT Resource Utilization
· MOT Data Ingestion Speed
· MOT Hardware
The tests were performed on servers with the following configuration and with 10Gbe networking –
· ARM64/Kunpeng 920-based 2-socket servers, model Taishan 2280 v2 (total 128 Cores), 800GB RAM, 1TB NVMe disk. OS: openEuler
· ARM64/Kunpeng 960-based 4-socket servers, model Taishan 2480 v2 (total 256 Cores), 512GB RAM, 1TB NVMe disk. OS: openEuler
x86-based Dell servers, with 2-sockets of Intel Xeon Gold 6154 CPU @ 3GHz with 18 Cores (72 Cores, with hyper-threading=on), 1TB RAM, 1TB SSD OS: CentOS 7.6
· x86-based SuperMicro server, with 8-sockets of Intel(R) Xeon(R) CPU E7-8890 v4 @ 2.20GHz 24 cores (total 384 Cores, with hyper-threading=on), 1TB RAM, 1.2TB SSD (Seagate 1200 SSD 200GB, SAS 12Gb/s). OS: Ubuntu 16.04.2 LTS
· x86-based Huawei server, with 4-sockets of Intel(R) Xeon(R) CPU E7-8890 v4 2.2Ghz (total 96 Cores, with hyper-threading=on), 512GB RAM, SSD 2TB OS: CentOS 7.6
· MOT Results – Summary
MOT provides higher performance than disk-tables by a factor of 2.5x to 4.1x and reaches 4.8 million tpmC on ARM/Kunpeng-based servers with 256 cores. The results clearly demonstrate MOT's exceptional ability to scale-up and utilize all hardware resources. Performance jumps as the quantity of CPU sockets and server cores increases.
MOT delivers up to 30,000 tpmC/core on ARM/Kunpeng-based servers and up to 40,000 tpmC/core on x86-based servers.
Due to a more efficient durability mechanism, in MOT the replication overhead of a Primary/Secondary High Availability scenario is 7% on ARM/Kunpeng and 2% on x86 servers, as opposed to the overhead in disk tables of 20% on ARM/Kunpeng and 15% on x86 servers.
Finally, MOT delivers 2.5x lower latency, with TPC-C transaction response times of 2 to 7 times faster.
· MOT High Throughput
The following shows the results of various MOT table high throughput tests.
ARM/Kunpeng 2-Socket 128 Cores
Performance
The following figure shows the results of testing the TPC-C benchmark on a Huawei ARM/Kunpeng server that has two sockets and 128 cores.
Four types of tests were performed –
· Two tests were performed on MOT tables and another two tests were performed on openGauss disk-based tables.
· Two of the tests were performed on a Single node (without high availability), meaning that no replication was performed to a secondary node. The other two tests were performed on Primary/Secondary nodes (with high availability), meaning that data written to the primary node was replicated to a secondary node.
MOT tables are represented in orange and disk-based tables are represented in blue.
Figure 1 ARM/Kunpeng 2-Socket 128 Cores – Performance Benchmarks
The results showed that:
· As expected, the performance of MOT tables is significantly greater than of disk-based tables in all cases.
· For a Single Node – 3.8M tpmC for MOT tables versus 1.5M tpmC for disk-based tables
· For a Primary/Secondary Node – 3.5M tpmC for MOT tables versus 1.2M tpmC for disk-based tables
· For production grade (high-availability) servers (Primary/Secondary Node) that require replication, the benefit of using MOT tables is even more significant than for a Single Node (without high-availability, meaning no replication).
· The MOT replication overhead of a Primary/Secondary High Availability scenario is 7% on ARM/Kunpeng and 2% on x86 servers, as opposed to the overhead of disk tables of 20% on ARM/Kunpeng and 15% on x86 servers.
Performance per CPU core
The following figure shows the TPC-C benchmark performance/throughput results per core of the tests performed on a Huawei ARM/Kunpeng server that has two sockets and 128 cores. The same four types of tests were performed (as described above).
Figure 2 ARM/Kunpeng 2-Socket 128 Cores – Performance per Core Benchmarks
The results showed that as expected, the performance of MOT tables is significantly greater per core than of disk-based tables in all cases. It also shows that for production grade (high-availability) servers (Primary/Secondary Node) that require replication, the benefit of using MOT tables is even more significant than for a Single Node (without high-availability, meaning no replication).
ARM/Kunpeng 4-Socket 256 Cores
The following demonstrates MOT's excellent concurrency control performance by showing the tpmC per quantity of connections.
Figure 3 ARM/Kunpeng 4-Socket 256 Cores – Performance Benchmarks
The results show that performance increases significantly even when there are many cores and that peak performance of 4.8M tpmC is achieved at 768 connections.
x86-based Servers
· 8-Socket 384 Cores
The following demonstrates MOT’s excellent concurrency control performance by comparing the tpmC per quantity of connections between disk-based tables and MOT. This test was performed on an x86 server with eight sockets and 384 cores. The orange represents the results of the MOT table.
Figure 4 x86 8-Socket 384 Cores – Performance Benchmarks
The results show that MOT tables significantly outperform disk-based tables and have very highly efficient performance per core on a 386 core server, reaching over 3M tpmC / core.
· 4-Socket 96 Cores
3.9 million tpmC was achieved by MOT on this 4-socket 96 cores server. The following figure shows a highly efficient MOT table performance per core reaching 40,000 tpmC / core.
Figure 5 4-Socket 96 Cores – Performance Benchmarks
Top comments (0)