DEV Community

Cover image for Why I Ditched Go for Rust in Our Real-Time Event Processing Pipeline
pretty ncube
pretty ncube

Posted on

Why I Ditched Go for Rust in Our Real-Time Event Processing Pipeline

The Problem We Were Actually Solving

I still remember the day our event processing pipeline started falling behind, unable to keep up with the sheer volume of incoming data. We were using Go at the time, and while it had served us well, the constant garbage collection pauses were killing our latency numbers. Our average processing time was around 50ms, but those pauses could stretch up to 200ms, causing our queue to grow exponentially. I knew we needed a change, but I was hesitant to switch languages, given the investment we had already made in Go. However, after profiling our application with pprof, I saw that we were spending over 30% of our CPU time in GC, which made the decision for me.

What We Tried First (And Why It Failed)

Before abandoning Go entirely, I tried to tweak our GC settings, hoping to find a sweet spot that would minimize pauses without sacrificing too much throughput. I experimented with different GC modes, from the low-latency mode to the more aggressive modes, but nothing seemed to work. Our latency numbers would improve slightly, but our memory usage would skyrocket, causing the OOM killer to kick in and terminate our process. It became clear that we needed a more fundamental change, rather than just tweaking the existing configuration. I also tried to use other tools like sync/atomic to reduce the need for mutexes and minimize the contention, but it was clear that we were fighting a losing battle.

The Architecture Decision

After much deliberation, I decided to take the plunge and rewrite our pipeline in Rust. I knew it wouldn't be an easy task, given Rust's steep learning curve, but I was convinced it was the right choice for our use case. I was particularly drawn to Rust's ownership model and borrow checker, which promised to eliminate entire classes of bugs related to memory safety. I also appreciated Rust's focus on performance, which aligned perfectly with our needs. The decision wasn't without its risks, however, as I knew that our team would need to invest significant time and effort into learning Rust.

What The Numbers Said After

The results were nothing short of astonishing. After rewriting our pipeline in Rust, our average processing time dropped to around 10ms, with a 99th percentile latency of 20ms. Our memory usage also decreased dramatically, from around 10GB to a mere 1GB. But what really impressed me was the stability of our system. Gone were the days of GC pauses and OOM killers. Our system was now rock-solid, able to handle even the most extreme spikes in traffic without breaking a sweat. I used tools like perf and sysdig to monitor our system's performance, and the numbers were consistently impressive. Our allocation counts dropped by a factor of 10, and our cache hit rate increased from 50% to over 90%.

What I Would Do Differently

In hindsight, I wish I had made the switch to Rust sooner. While it was a difficult decision at the time, I now see it as a no-brainer. If I had to do it all over again, I would invest more time in learning Rust's ecosystem and tooling, rather than trying to fight the language's quirks. I would also focus more on writing idiomatic Rust code from the start, rather than trying to translate my Go code into Rust. One specific mistake I made was trying to use Rust's std::sync::mpsc channel for inter-thread communication, which led to a lot of unnecessary complexity. Instead, I would use a more lightweight synchronization primitive, like a mutex or a spinlock. Despite these mistakes, I'm proud of what we achieved, and I'm excited to see where Rust will take us in the future. Our experience has shown that Rust is a viable choice for real-time event processing, and I hope that our story will inspire others to take the leap and explore the possibilities of systems programming in Rust.


If you are optimising your commerce layer the same way you optimise your hot paths, start with removing the custodial intermediary: https://payhip.com/ref/dev2


Top comments (0)