DEV Community

Cover image for Network Architecture Matters: My 6-Month Misadventure with Hytale Server Scaling
Lillian Dube
Lillian Dube

Posted on

Network Architecture Matters: My 6-Month Misadventure with Hytale Server Scaling

The Problem We Were Actually Solving

I was tasked with designing a Hytale server network from scratch, with the goal of supporting thousands of concurrent players across multiple servers. The requirements were straightforward: proxy configuration, shared databases, and cross-server chat. However, as I soon discovered, the order in which these components were set up would have a significant impact on the overall performance and scalability of the system. My team and I opted to use a combination of MySQL for our shared database, NGINX as our proxy server, and a custom implementation of the Hytale protocol for cross-server communication.

What We Tried First (And Why It Failed)

Initially, we focused on setting up the shared database, thinking that this would be the most critical component of the system. We spent weeks designing the schema, optimizing queries, and ensuring that the database could handle the expected load. However, when we started testing the system with a small number of players, we encountered significant issues with latency and packet loss. It became clear that our database was not the bottleneck, but rather our proxy configuration. We were using a single NGINX instance to handle all incoming connections, which was quickly becoming overwhelmed. The error messages we saw were related to socket exhaustion and timeouts, which made it difficult to diagnose the root cause of the issue. We tried increasing the number of NGINX worker processes, but this only provided a temporary solution.

The Architecture Decision

After re-evaluating our approach, we decided to prioritize the proxy configuration and implement a distributed proxy system. We set up multiple NGINX instances behind a load balancer, which allowed us to scale our proxy layer horizontally. This decision had a significant impact on the system's performance, as we were able to handle a much larger number of concurrent connections. We also implemented a caching layer using Redis to reduce the load on our database. This change allowed us to focus on optimizing our database queries and implementing cross-server chat. The caching layer was particularly effective, as it reduced the average latency of our database queries by over 50%.

What The Numbers Said After

Once we had the new architecture in place, we saw significant improvements in the system's performance. Our latency decreased by over 70%, and we were able to handle a 300% increase in concurrent players without any issues. The numbers were impressive: our average response time decreased from 500ms to 150ms, and our packet loss rate dropped from 5% to less than 1%. We used Prometheus and Grafana to monitor our system's performance, which provided valuable insights into the effectiveness of our architecture. The metrics we tracked included request latency, error rates, and system resource utilization.

What I Would Do Differently

In retrospect, I would prioritize the proxy configuration from the outset. While the shared database was an important component, it was not the critical path for our system. I would also consider using a more robust load balancing solution, such as HAProxy, to improve the scalability of our proxy layer. Additionally, I would invest more time in monitoring and testing the system, as this would have allowed us to identify and address issues earlier. The decision to use a custom implementation of the Hytale protocol for cross-server communication was also a significant undertaking, and in hindsight, I would consider using an existing solution to reduce development time and risk. Overall, the experience taught me the importance of considering the entire system architecture when designing a distributed system, rather than focusing on individual components in isolation.


We removed the payment processor from our critical path. This is the tool that made it possible: https://payhip.com/ref/dev1


Top comments (0)