Apple's M1 chips are not the news any more. Many people know that these chips are fast and that MacOS applications have to be adapted for the new architecture. Tarantool development team decided to take on the same challenge.
My name is Alexey Koryakin, and I am CTO at Tarantool, a part of the VK ecosystem. I will explain why we needed this even though macOS is not used for production servers — I will show how we solved the task. and I will share you the benchmark results.
The Task and the Solution
Tarantool is a high-performance in-memory computing platform that consists of the database and the application server. Many developers install Tarantool on their office computers and write code there. In many cases, it's more convenient than a separate server.
Some developers from our team also prefer installing Tarantool locally. Among them is our product manager, who bought a new M1-based MacBook Air at the beginning of the year. So one time he asked our technical team: "Why doesn't Tarantool run natively on the M1 chip? I have recently bought a new MacBook Air, and I can launch Tarantool only via Rosetta. The native support of Apple M1 chips could become a cool feature for our community so that people switching to new Macs could efficiently develop Tarantool-based systems".
The technical team thought and decided:
- Tarantool is known to be very fast. M1 is known to be very fast. And we want to find out how much faster Tarantool could work on M1.
- Apple is actively upgrading its Mac product line, migrating them to M1 chips (and now into M1 Max). Developers and other IT specialists across the world are actively switching to the new platform. The existing x86_64 software is launched via the Rosetta translation environment that impairs the full power of the software (including Tarantool) run on new Apple chips. This must be changed.
This is how we've ended up with a new task — support of M1 chips
Around the same time, we were working on the ARM64 support for Linux. Since M1 is essentially ARM64 with some specific features, we assumed that implementing M1 support would be easy. This proved to be not far from the truth: we have finalized most tasks of M1 support with ARM64 support for Linux. The main issues were related to specific features of RISC instructions of ARM architecture. For example, direct transfer of control from one section of the machine code to another one is possible only under 2 MB offsets. Apple ABI that differs from ARM Linux ABI became the distinguishing feature of M1 support. We had to tune Tarantool code specifically for new chips, prying into public Apple documentation.
All in all, support of ARM64 and M1 in particular is a relatively simple engineering task. There were some issues, but we were able to resolve them without reading endless specifications and all-nighters. It took us about 4 months, from May until August, to complete the entire endeavor.
Performance Benchmark
M1 is famous for its outstanding performance. Even the code run through Rosetta works fast. We didn't forget the performance either and decided to check how much faster Tarantool could run.
We compared macOS running on available commodity devices with different hardware. We didn't aim to compare different operating systems or chase after the server CPUs like Xeon. We just wanted to know how much faster Tarantool would run for the developer that switched to a new MacBook, and to assess the perspectives of a new Mac platform. For the test we used several computers owned by the team or available in retail
● MacBook Pro 16,2 (2020),
● Mac mini 8,1 (2018),
● MacBook Air 10,1 (2020), Apple M1.
We wrote a simple benchmark test that does three things:
- This is Lua code, which means an application server is involved.
- This code writes to the database, which means the transaction engine of the database is involved.
- This code runs on М1 and doesn't crash :)
The benchmark runs in one system thread that starts 50 fibers, each of them inserting 100 operations per one transaction. This scenario provides a greater workload for the CPU rather than for memory or storage. And this is exactly what we need to test the CPU. If the transaction consisted of one or three–four updates, then the workload would rather shift from CPU to RAM or storage.
We didd multiple tests inserting from 1 up to 20 million records. We repeated every test 15 times and then calculated the median value. These are complete results to examine all tests in detail. And here, for illustrative purposes, we show the median value on the diagram.
We see that Tarantool performance on M1 chip is twice better than that of the notebook of the same year based on a different processor.
Then we have tested M1 for the tasks fulfilled via the Rosetta translator. The performance turned out to be approximately the same as that of the Mac mini 2018, but significantly higher than that of MacBook Pro 2020.
Wall Clock Benchmark. The same devices plus launch via Rosetta
We understand that this doesn't mean double performance of all applications. They all vary in their code, tasks, and conditions. But in any way, you can expect that your local Tarantool installation will work faster.
To Sum Up...
Starting from 2.10.0-beta Tarantool can natively run on M1 chips. So far this is preliminary support: something may crash or run unstable. We have resolved almost all bugs we knew about, with a few minor ones left. For example, there are some issues with the JIT compiler. But this didn't prevent the team product manager from installing Tarantool on his new MacBook Air and running it every day.
Later we will resolve other known bugs, and those that the developers will report to us. If you have a new M1 Mac, try the latest version of Tarantool. If something crashes, please write the bug report, and we'll help.
Try Tarantool cluster at https://try.tarantool.io. Download Tarantool at the official website, and get help in the Telegram chat.
Top comments (1)
Out of curiosity, have you done any experimenting with running any of this client-side in a web or service worker (or WASM)? I'm interested in utilizing some db processing that can be used for client/server client/client syncing operations. I'd like to know if this might be completely overblown for what I'd like to use it for.