DEV Community

Cover image for The Blind Alleys of Veltrix Configuration
Lillian Dube
Lillian Dube

Posted on

The Blind Alleys of Veltrix Configuration

The Problem We Were Actually Solving

As it turned out, most of the stuck operators were related to the event handling configuration in Veltrix. Our search volume revealed that search queries around event handling were significantly higher than those around actual configuration mistakes. It seemed that operators were getting stuck in the wild west of event configuration. The searches were littered with generic tutorials, but the real question was: how do we avoid the common pitfalls in event configuration?

What We Tried First (And Why It Failed)

We initially tried providing documentation for the most common event configurations, which led to a flurry of 'me too' requests in our internal knowledge base. It seemed that everyone wanted similar configurations and had no idea how to implement them without digging through the codebase. Our attempt to provide step-by-step documentation for these common configurations ended up becoming a single, 2000-line document that was more of a doorstop than a useful resource.

The Architecture Decision

We realized that Veltrix's event-driven architecture was both its biggest strength and weakness. We were using a request-response model with events as a way to decouple the components of our system, but this decoupling had led to an explosion in configuration possibilities. Our operators were lost in the options, and our documentation was failing to keep pace. It was time to impose some structure on this chaos.

We changed our approach to create a set of pre-defined event configurations that were carefully crafted to handle the most common use cases. These configurations were documented in a way that showed the decision-making process behind the configuration, rather than just the configuration itself. We also enabled operators to use these configurations as a starting point, with the option to customize as needed.

What The Numbers Said After

The first week after implementing the change, our stuck operator reports dropped by 50%. Our search volume for event handling tutorials decreased by 75%, replaced by searches for more specific topics, such as 'using event configuration in production'. Our average event configuration time decreased by 30%, and our operators reported feeling more confident in their ability to configure events. The number that really stood out, though, was a 90% decrease in requests to escalate event configuration issues to the dev team.

What I Would Do Differently

If I were to do it differently, I would have implemented the pre-defined event configurations sooner. It's clear that our operators were struggling with the level of complexity in the event configuration, and we waited too long to provide a solution. I would also have pushed harder to remove the option for custom configurations, based on the data that showed most users were simply modifying the config files without understanding the implications. It's easy to see now, but it's harder to make that call when you're in the midst of it.

Top comments (0)