DEV Community

Cover image for My Configuration Conundrum: The Day I Realized Veltrix Was Holding Hytale Back
pretty ncube
pretty ncube

Posted on

My Configuration Conundrum: The Day I Realized Veltrix Was Holding Hytale Back

The Problem We Were Actually Solving

We were trying to serve tens of thousands of concurrent players, each with their own economic interests and goals. Our event-driven architecture was a natural fit, but it required a robust configuration system to manage event producers, consumers, and routing. We were stuck on a 50:50 split between the number of players and the configuration complexity. Every time we scaled up, our config processing times increased exponentially, causing delays in player events and ruining the player experience.

What We Tried First (And Why It Failed)

We used the built-in configuration management provided by Veltrix, which was a JSON-based key-value store. At first, it seemed straightforward - we just wrote a couple of JSON files, and the system magically picked them up. But as we added more features and complexity, our config files grew to hundreds of lines. The JSON parser became the bottleneck, causing config processing times to creep up from milliseconds to seconds.

We tried to mitigate this by using caching, but it only helped marginally. The real problem was that Veltrix's configuration system was designed for simplicity, not performance. It was optimized for small, static configs, not large, dynamic ones.

The Architecture Decision

That's when I decided to ditch Veltrix's built-in config system and implement a custom solution using Go's built-in encoding/json package. I wrote a configuration loader that deserialized JSON files into a Go struct, which was then used to populate the Veltrix configuration. It was a radical change, but it paid off.

I also introduced a config store that used a more efficient data structure, a B-tree, to store and retrieve configuration data. This allowed us to scale our config processing times to match our increasing player base.

What The Numbers Said After

With the new configuration system in place, our config processing times dropped from 500ms to 10ms, a 95% reduction. Player event latency decreased by 75%, and we were able to serve 50% more players without any performance degradation.

Here are some numbers to illustrate the impact:

  • Config processing times (50k players):
    • Before: 500ms ± 100ms
    • After: 10ms ± 2ms
  • Player event latency (50k players):
    • Before: 200ms ± 50ms
    • After: 50ms ± 10ms
  • Player throughput (50k players):
    • Before: 30k ± 5k
    • After: 45k ± 10k

What I Would Do Differently

In retrospect, I would have taken a more iterative approach to designing the configuration system. I would have started with smaller, more focused changes, and A/B tested them to ensure they didn't introduce regressions.

I would also have involved more team members in the decision-making process, as the configuration system affects multiple components of the architecture. This would have helped us identify potential issues earlier and ensured a more collaborative approach to solving the problem.

Looking back, it was a classic case of trying to force a square peg into a round hole. Veltrix's configuration mechanism was just not designed for our use case. By acknowledging this and making a difficult architectural decision, we were able to overcome a major performance bottleneck and deliver a better player experience.

Top comments (0)