350 Million Messages per Day
Co-Authors: Abdulsamet Ileri, Nihat Alim
In this article, you will read a story about how we decreased memory usage by ~50%, increased CPU Efficiency by ~%80 and TP by ~20 by migrating from Scala to Go.
Introduction
Trendyol has more than 170 million sellable and 200 million ready-to-sell products right now. At any time, there may be a change, or invalidation, as we call it, on these products via different events like promotion, stock, and price. By changes, we mean millions of them, approximately 350 million a day if we need to be precise. As the Trendyol Indexing Team, we need to apply these changes almost near real-time because any latency could cause product data to be displayed incorrectly. We donât even want to think about writing the price of a product wrong.
In the middle of all these changes is our application, Perses. In Greek mythology, Perses was the Greek Titan God of Destruction. Like the Destruction God, our application Perses destroys the old product data and replaces it with its new version.
As we can see from the daily throughput graphic of Perses below, Perses does millions of I/O operations daily. It is crucial to perform them correctly and without any latency to be able to show the users the correct product data. To achieve our purpose of having a high-performant application, Perses is designed as a multi-deployment application so that every deployment can scale independently and does not block each other invalidation process.
From all of these, we can easily say that Perses plays an important role in Trendyolâs invalidation process.
There were several reasons why we made this migration decision.
- Previously, getting better results in terms of resource usage and performance by migrating our other smaller consumer projects.
- Learning and maintaining Go was easier than maintaining old Scala projects for our team.
Our Implementation Steps
We will explain our re-platforming journey in 5 steps.
1. Without monitoring, you cannot solve murders.
In Scala Perses, the consuming operation was made in batches using Akka Stream (v2.12). You can see the implementation in the code block below.
We tested and compared these two as soon as we implemented batch consuming in Go Perses (via kafka-go, but the results shocked us. While Scala Perses was processing 13k messages in a minute per pod, Go could only process 4k. đą
After seeing the results, we monitored the application and saw the bottleneck in the Kafka producer part. When we dived into the codebase of the producer in the Kafka library, we realized that the producing operation was done synchronously. It means that we were going to the broker for every message, which was harmful to the performance.
2. Caught the killer! Wait, there are more!?
We decided to change the producer implementation from synchronous to asynchronous and to in batches. The message queue contained messages sent to two different topics at that time.
After completing the implementation, we ran a load test immediately, but the results were still disappointing. Go Perses processed 9k messages in a minute per pod, but we still didnât reach Scala Perses 13k. đ
3. Following the breadcrumbs
We separated the message channel for our topics. We had only two topics that we needed to produce back then, so we created two message channels and one Goroutine per channel.
After this change, we tried different âbatch sizeâ and âbatch durationâ parameters to achieve the best performance. After several tries, we saw that the optimal values were 500 for batch size and 500ms for the batch duration. When we ran a load test again with the final parameters with 36 partitions and 12 pods, we processed 1.094.800 messages in 10 minutes with 136k throughput. Scala was still more performant with its 156k throughput, it could process the same number of messages in 8 minutes.đ€
4. Itâs not a dead end
We are using Uberâs automaxprocs library for one of our other Go projects and saw performance enhancement, so we also wanted to try it for Go Perses. Unfortunately, throughput was not changed because Perses is more i/o bound than CPU bound. đą
5. The sweet smell of victory
Because we still couldnât get the desired results, we decided to focus on our architectural design to figure out what to do.
In the first part of the architecture, a goroutine continuously listens to a topic and fetches new messages using an internal message queue. We think we could try tuning the queue size here, so we decided to configure it.
In the second part of the architecture, after the fetched messages come to the channel, the goroutines process them. Here, we thought we could tune the channel buffer size and the number of worker goroutines.
We were surprised by the results after changing the configurations in the tuning process đđ. You can see our trials and their results in the table below.
You can see the comparison of the most performant Scala Perses and Go Perses results below.
Conclusion
As a result of the migration process, we optimized the following resources
- Memory usage: ~50% Decrease (from 1.127 GB to 0.622 GB)
- CPU: ~80% Efficient (from 1.72 to 0.379)
- TP: ~20% Increase (from 156k to 189k)
We are very pleased to make this optimization on a multiple-deployment codebase that works under high load.
What we did learn!
- We realized how important monitoring is when it helps us to address the kafka producer problem.
- By diving into our project design, we realized some critical parameters still needed to be tuned to improve performance.
Thanks for reading đđđ
We open-sourced some of our experience with a built-in retry manager at kafka-konsumer. We are excited to get your feedback.
Thank you to Kutlu Araslı, Emre Odabas, and Mert Bulut for their support đ
If youâre interested in joining our team, you can apply for the backend developer role or any of our current open positions.
Top comments (0)