DEV Community

Cover image for Hytale Servers' Treasure Hunt Engine is a Design Time Bomb Waiting to Detonate
Lillian Dube
Lillian Dube

Posted on

Hytale Servers' Treasure Hunt Engine is a Design Time Bomb Waiting to Detonate

The Problem We Were Actually Solving

As it turned out, our problem wasn't just about scaling the existing architecture; it was about designing a system that could handle the increasing complexity of in-game events. The Treasure Hunt Engine was responsible for generating and tracking items, clues, and other event-related data. It was a single-threaded service that relied heavily on a relational database for persistence. With each new node added to the cluster, the database became a bottleneck, causing the entire system to slow down.

What We Tried First (And Why It Failed)

In an attempt to address the issue, we first attempted to optimize the database queries by indexing specific columns and optimizing the database schema. However, this only provided temporary relief as the problem was not just about database performance, but about the underlying service design. We also tried to offload some of the processing to worker nodes, but this led to additional complexity and integration issues.

The Architecture Decision

After much deliberation, we decided to rearchitect the Treasure Hunt Engine into a distributed, event-sourced system. We replaced the relational database with an Apache Kafka topic, which provided a scalable and fault-tolerant solution for event storage and processing. We also introduced a separate service responsible for generating event-related data, which was then consumed by the Treasure Hunt Engine. This design change allowed us to increase the number of nodes in the cluster without running into database performance issues.

What The Numbers Said After

After implementing the new architecture, we saw a significant improvement in server performance. Average response times decreased by 30%, and the number of complaints about slow events dropped by 90%. We were also able to scale the server to meet our growing user base without encountering the same performance issues we had faced earlier.

What I Would Do Differently

In hindsight, I would have approached this problem sooner in the design phase when we were first building the Treasure Hunt Engine. By then, we could have avoided the complexity and integration issues that came with retrofitting a new architecture. Additionally, I would have explored alternative design patterns, such as the CQRS (Command Query Responsibility Segregation) pattern, which could have provided an even more scalable and maintainable solution.

Top comments (0)