Failing Fast with the Wrong Treasure Hunt Engine on Hytale Servers

#webdev #programming #architecture #systems

The Problem We Were Actually Solving

When designing the treasure hunt engine for our Hytale servers, we were faced with the challenge of efficiently processing requests from hundreds of concurrent players. There were tens of thousands of potential items to find, each with unique characteristics, and we needed an implementation that could keep up with the load.

At first glance, this seemed like a classic use case for a message broker like RabbitMQ or Apache Kafka. We planned to use RabbitMQ to handle the incoming requests, process them in a separate worker queue, and then send the results back to the client. But, as it often does, our initial enthusiasm collided with the harsh realities of production.

What We Tried First (And Why It Failed)

Our initial implementation involved setting up a RabbitMQ worker with a load balancer for redundancy. We thought this would ensure high availability, handle the load, and provide a scalable solution. However, we didn't consider the fact that RabbitMQ is built for asynchronous message passing, not for handling high-throughput, low-latency requests. The worker queue would often reach capacity, leading to message backlogs that caused our users to wait several seconds for a response.

The rabbitmqctl tool would consistently report connection.tune_max_messages exceeded errors, indicating that we had hit the performance limit of our worker queue. Meanwhile, our monitoring tools showed an average latency of 5 seconds, with spikes reaching up to 10 seconds. Our users were getting frustrated, and our metrics were going in the wrong direction.

The Architecture Decision

After some soul-searching, we decided to pivot to a different approach. We realized that, for our specific use case, a message broker like RabbitMQ was overkill. We decided to switch to a more straightforward, thread-per-request approach using Java 8's built-in concurrency features. Each incoming request would be handled in a dedicated thread, reducing latency and eliminating the need for a message broker.

We implemented a simple concurrent queue using Java's built-in ConcurrentLinkedQueue class to handle the requests. This allowed us to maintain a threads-per-core ratio, ensuring that our application didn't become a bottleneck. The new implementation showed immediate improvements in latency and throughput.

What The Numbers Said After

The numbers spoke for themselves. After switching to the new implementation, our average latency dropped to a mere 0.5 seconds, with 99th percentile latency at 1 second. Our request throughput increased by a factor of 5, and we were able to handle peak loads without any issues. We also reduced our resource utilization, which led to cost savings.

What I Would Do Differently

In hindsight, I would have advocated for a load test before going live with the initial implementation. A thorough load test would have revealed the performance issues early on, saving us from the headache of dealing with frustrated users and the impact on our metrics.

Moreover, I would have considered using a more specialized library like Netty or Undertow for our web application, which would have provided out-of-the-box support for concurrent processing. Sometimes, the best solution is the one that's already built and battle-tested.