DEV Community

Cover image for The Misguided Pursuit of Parallel Scheduling in Hytale's Treasure Hunt Engine
Lillian Dube
Lillian Dube

Posted on

The Misguided Pursuit of Parallel Scheduling in Hytale's Treasure Hunt Engine

The Problem We Were Actually Solving

By the time I joined the project, the Treasure Hunt Engine had already been live for several months. Initially designed to facilitate coordinated server-side events, the system suffered from a pronounced bottleneck in scheduling tasks for concurrent players. The team was under pressure to improve performance, and the promise of multi-threading seemed too enticing to ignore. The objective was clear: reduce scheduling latency by taking advantage of the available CPU cores.

What We Tried First (And Why It Failed)

Our first attempt at parallel scheduling involved dividing the player pool among multiple threads, each responsible for dispatching tasks to its assigned subset of players. Theoretically, this approach would distribute the workload evenly, leveraging the power of multi-threading to expedite task scheduling. We implemented this logic using Java's ExecutorService, leveraging the ForkJoinPool and ThreadPoolExecutor to manage task distribution and execution. However, things quickly went awry.

As the system scaled, we encountered a plethora of issues stemming from inconsistent thread state, exacerbated by the inherent non-determinism of thread scheduling. The ExecutorService's dynamic thread allocation failed to provide the predictable performance our application required. We soon found ourselves plagued by an erratic mix of slow and fast scheduling, coupled with an inordinate amount of context switching overhead. The system's performance was more erratic than ever – a far cry from the improved performance we had initially set out to achieve.

The Architecture Decision

After weeks of debugging and consulting various resources (ranging from the Java concurrency API documentation to the concurrency chapter in Java Performance: The Definitive Guide), we realized that a different approach was needed. The problem lay not in the toolset itself, but in the underlying design principle guiding our system. In essence, we had fallen prey to the seductive allure of parallel processing, neglecting the fundamental consistency models that underpin a robust event scheduling system.

In a radical departure from our previous approach, we opted for a thread-per-player scheduling model, using a BlockingQueue to manage task distribution and execution. By dedicating a thread to each player, we ensured thread safety and predictable task scheduling, effectively eliminating the context switching overhead that had proven so detrimental to performance. This decision not only drastically improved task scheduling times but also provided a more stable and reliable system overall.

What The Numbers Said After

Upon deployment, the system's performance rebounded significantly. Reduced task scheduling latency resulted in a substantial increase in player engagement and overall server throughput. Our metrics revealed an average task scheduling time of 5ms, down from a previous high of 25ms. More importantly, the system's consistency improved dramatically, with a notable reduction in scheduling-related errors. We observed a near 90% decrease in the dreaded "ConcurrentModificationException" errors, often indicative of thread-interference issues.

What I Would Do Differently

In retrospect, I would advise against prematurely embracing parallel processing, especially when dealing with complex, concurrent systems. The allure of multi-threading is undeniable, but it must be tempered by a deep understanding of the consistency models that underpin such systems. Instead of diving headfirst into parallelization, I would recommend taking the time to develop a thorough understanding of the underlying system requirements, coupled with a willingness to confront the inherent trade-offs that come with each architectural decision. Only by prioritizing system stability and predictability can we hope to build truly robust and reliable software systems.

Top comments (0)