Let me tell you a story that starts in 2013, peaks somewhere around 2019, and ends with me staring at a $4,200 AWS bill at 11pm on a Tuesday.
The Revolution Nobody Saw Coming
March 2013. PyCon. A French developer named Solomon Hykes gets on stage and does a five-minute demo of something called Docker.
The audience is confused. Then curious. Then the applause starts.
In those five minutes, he essentially broke software deployment as everyone knew it. No more "works on my machine." No more environment hell. You take your app, you put it in a box, and that box runs anywhere. Same box. Every time.
The internet lost its mind.
Within a year, every startup was containerizing everything. Within two, enterprises were asking their CTOs why they weren't doing it yet. Within three, Docker was valued at over a billion dollars and "containerization" stopped being a niche word.
But here's the thing about revolutions — they create new problems.
Enter: The Orchestration Wars
By 2014 you had thousands of containers. Maybe tens of thousands. And now what? How do you run them? How do you restart the ones that crash? How do you roll out updates without downtime? How do you route traffic between them?
This is when it got messy. Genuinely messy.
Three players showed up:
Docker Swarm — Docker's own answer. Simple. Integrated. And honestly? Pretty good. But Docker the company was making decisions that confused everyone. They kept pivoting. Swarm never got the trust it deserved.
Apache Mesos — the enterprise option. Heavy. Complex. Battle-tested at Twitter and Airbnb scale. But you needed a PhD to configure it and the learning curve was basically a vertical wall.
Kubernetes — Google's open-source version of Borg, the internal system they'd been running for over a decade. K8s dropped in 2014 and it was ugly at first. The config was verbose, the concepts were alien, the documentation was written by people who already understood it.
But then something happened. Google poured resources into it. A foundation formed around it (CNCF — Cloud Native Computing Foundation, 2015). AWS, Azure, GCP all built managed versions of it. The ecosystem exploded.
By 2017 the wars were effectively over.
Kubernetes won. Swarm is still technically alive but nobody talks about it at conferences anymore. Mesos is mostly a ghost. K8s became the operating system of the cloud.
Go's Golden Era
Here's something worth understanding: Kubernetes is written in Go. Docker was written in Go. Traefik, Prometheus, Terraform, Consul, etcd, Helm — basically everything that runs your infrastructure today traces back to Go.
This wasn't an accident.
Go was designed at Google in 2007 by some of the most legendary people in computer science — Rob Pike, Ken Thompson (yes, the Unix guy), Robert Griesemer. They wanted a language that compiled fast, ran fast, handled concurrency natively, and didn't make you want to throw your laptop through a window.
They got it right.
Go's goroutines made writing networked, concurrent services stupid easy. The toolchain was clean. The standard library was rich. You could hire people who knew it. Everything clicked.
The cloud-native world standardized on Go almost by accident. It was the right language, at the right time, for the right problems.
Between 2015 and 2020, if you were building infrastructure tooling and not using Go, people looked at you sideways.
Meanwhile, In a Mozilla Office Somewhere...
- A Mozilla employee named Graydon Hoare is working on something in his spare time. A new systems language. One that gives you C-level performance but makes memory corruption structurally impossible.
No garbage collector. No runtime overhead. But also no dangling pointers, no buffer overflows, no use-after-free bugs — the entire class of vulnerabilities that has haunted C and C++ for fifty years.
The trick? A concept called ownership. Every piece of memory has exactly one owner. When the owner goes out of scope, the memory is freed. Not by a GC running in the background. Not "eventually." Immediately. Deterministically. At compile time.
Mozilla ships Rust 1.0 in May 2015. The initial reaction from the systems programming community was somewhere between skeptical and hostile. The borrow checker — the compiler mechanism that enforces ownership — felt like fighting the language more than writing it.
People tried it, got confused, went back to C++ or Go.
But a small group of people got it. And they kept building. Quietly.
The Quiet Takeover
- Cloudflare has a problem.
They're running nginx. Everywhere. nginx is written in C. It's fast, it's reliable, but it's also showing its age — the architecture doesn't handle modern HTTP patterns well, extending it is painful, and the codebase has had its share of security issues because, well, it's C.
Cloudflare decides to replace it. Not patch it. Replace it.
They write Pingora in Rust. A full HTTP proxy, from scratch, handling 1 trillion requests per day in production.
The numbers they published were obscene. Compared to nginx: 70% reduction in CPU. 67% reduction in memory. Near-zero security vulnerabilities because the entire class of memory bugs simply cannot exist in safe Rust.
And they didn't stop there. By late 2025, Cloudflare went further — they rewrote FL, the actual "brain" of Cloudflare (the system that applies every customer's WAF rules, DDoS settings, and routing logic) in Rust, calling it FL2. The result: response time dropped by 10ms and overall performance went up 25%. They shut down FL1 entirely in early 2026.
Then AWS announced they were writing parts of their virtualization layer in Rust. Then Microsoft said Rust was the preferred language for new systems code at the company. Then the Linux kernel — the Linux kernel, which has run on C for thirty years — accepted Rust as a second language, with stable driver support landing in kernel 6.1.
The systems programming community quietly started paying attention.
And then the Go community started getting uncomfortable.
Back To My Tuesday Night
This is where I come back in.
We were a mid-size startup. Not Cloudflare. Not Amazon. But we were scaling, traffic was growing, and our Kubernetes setup was running fine.
That word again. Fine.
We were using Traefik as our ingress controller. Go-based, battle-tested, the default choice for half the K8s tutorials on the internet. Three replicas, handling maybe 800 req/s at peak.
I was not watching the AWS bill closely enough. That was my fault.
Then I did. And I saw $4,200 for one month of ingress-related compute.
I didn't say anything for thirty seconds. I just sat there. That specific feeling when a number doesn't match your mental model and your brain is trying to reconcile it.
The Panic Phase
I started Googling. You should never Google things when you're panicking. You come out two hours later having read about eBPF, service meshes, and someone's confident blog post from 2021 recommending something that got deprecated in 2023.
What I found when I actually focused: the memory profile of our ingress layer was absurd for what we were asking it to do.
Each Traefik pod idling at ~180MB RSS. Under load? Comfortably 400MB+. That's not Traefik being badly written — it isn't. It's Go's garbage collector doing what it's designed to do. The GC optimizes for latency, not memory. It keeps things around, waits for a good time to clean up, and "a good time" under constant HTTP load basically never comes.
We had 3 pods × 400MB × 2 t3.large nodes (because they needed the headroom) running 24/7.
Do the math on AWS pricing. Then cry.
Then I found the Pingora post. Read the whole thing this time. Pulled the benchmark numbers. Understood what was actually happening at the memory level in Rust vs Go.
Something clicked.
Why Rust Hits Different Here
Nobody says this clearly enough so I will:
For most application code, Go's GC is fine. Genuinely fine.
Writing a web API? Go. Building a CLI tool? Go. Microservices with moderate traffic? Go, easily.
But a proxy is a different beast. A proxy's entire job is to sit between two things and move bytes from one to the other, as fast as possible, thousands of times per second. Every request means allocations — headers, buffers, connection state. Every one of those allocations eventually needs to be freed.
In Go, "eventually" is doing a lot of work in that sentence.
In Rust, memory is freed the moment it goes out of scope. Not by a background process. Not on a schedule. Structurally, at the language level. The compiler guarantees it.
For a proxy, that's not a nice-to-have. It's the difference between flat memory usage and a profile that climbs under load and triggers GC pauses at exactly the wrong moment.
linkerd2-proxy — the data plane for the Linkerd service mesh — is written in Rust for exactly this reason. It's running in production service meshes at serious scale. Tokio — Rust's async runtime — is mature enough that they're running their own conference in 2026. The ecosystem isn't a science experiment anymore.
The Migration (It Was Not Smooth, I Won't Lie)
We didn't rewrite Traefik. We're not a systems programming shop.
We moved to Envoy as the core proxy (C++, but with a very different memory model than Go's runtime, and a mature extensibility story) plus a small Rust service handling our custom routing logic that Traefik was previously doing in middleware.
Week one: three outages.
Not because Rust is hard (our Rust service was fine). Because Envoy's config model is completely different from Traefik's and we were copying assumptions over like fools. Traefik does a lot of magic via annotations. Envoy does nothing by magic. Everything explicit. Everything verbose. First time you see an Envoy config file you think someone is hazing you.
Week two: stable. Genuinely stable. We kept waiting for something to explode. Nothing did.
Week three: I pulled up the memory dashboards.
I called my co-engineer over. We both stared at it for a moment.
The Numbers (This Is The Part You Came For)
Before:
- 3x Traefik pods
- ~380MB average RSS each
- CPU spikes during GC under high traffic
- 2x
t3.largenodes just for ingress (2 vCPU, 8GB each)
After:
- 2x Envoy pods + Rust routing layer
- ~40MB average RSS each
- CPU: completely flat. No spikes. No GC sweeps. Nothing.
- 1x
t3.smallnode handles it without breaking a sweat
Node costs alone: from ~$340/month to ~$30/month.
Factor in traffic processing, data transfer handling, reduced overhead across the board — the full picture came out to roughly 10x cheaper.
Not 10% cheaper. Not 2x. Ten times.
The title isn't clickbait. I was as shocked as you are right now.
Should You Actually Do This?
Real answer: probably not yet, unless you're bleeding money.
Operational complexity is a real cost. This migration took two engineers about three weeks including the debugging nights and the config archaeology. If Traefik is working for you and your AWS bill isn't making you spiral, leave it alone. Boring infrastructure is good infrastructure.
But if your ingress layer has its own line item on your cost explorer — and you feel a specific kind of shame looking at it — the tooling exists, it's production-proven, and the math is not subtle.
The Rust networking ecosystem in 2026 is not the Rust networking ecosystem of 2018. Cloudflare has replaced their entire request-handling stack with Rust. Pingora is open-source and now at v0.7.0 — it's evolved from "proxy" into what people are calling programmable network infrastructure. linkerd2-proxy has been running in production service meshes for years. Tokio is the undisputed async runtime of the ecosystem. The scary part is mostly gone. What's left is just learning.
The Actual Lesson (And It's Not About Languages)
Go didn't lose. This wasn't a rivalry with a winner.
Go won the cloud infrastructure era because it was exactly right for that moment — concurrency primitives, fast compilation, readable code, a great standard library. Kubernetes exists because of Go. The cloud-native ecosystem exists because of Go.
But Rust is winning the next layer — the performance-critical substrate that the rest of the infrastructure runs on. Proxies. Kernels. Network dataplanes. Places where a GC isn't a tradeoff you can afford because memory is money and latency is user experience.
Both can be true. A hammer is the right tool for nails. A scalpel is the right tool for surgery. Neither replaces the other.
That $4,200 bill became $390.
The CFO asked me what changed.
I said I learned a new programming language.
He nodded like he understood and immediately changed the subject.
That's fine. The bill was paid. The graphs were flat. The nodes were small.
Sometimes the best infrastructure is the one nobody notices.
Find Me Across the Web
- ✍️ Medium: @syedahmershah
- 💬 DEV.to: @syedahmershah
- 🧠 Hashnode: @syedahmershah
- 💻 GitHub: @ahmershahdev
- 🔗 LinkedIn: Syed Ahmer Shah
- 🧭 All links: Beacons
- 🌐 Portfolio: ahmershah.dev







Top comments (18)
"Boring infrastructure is good infrastructure." Words to live by! It’s easy to get sucked into rewriting everything for the sake of hype, but your point about operational complexity is crucial. 3 weeks of engineering time for 2 people is a real cost, but with a 10x savings on a $4,200/month recurring bill, your ROI hit break-even almost immediately. Great execution.
Exactly! It’s easy to fall into the trap of 'resume-driven development,' but sometimes the most impressive engineering is the kind that just quietly works and saves the company money. Glad you appreciated the ROI breakdown—it's definitely satisfying to see the infrastructure pay for itself so quickly!
The timing of this is perfect. Seeing Cloudflare completely phase out FL1 for their Rust-based FL2 earlier this year really proved that this isn't just a niche optimization anymore—it's the new standard for edge and proxy layers. Go's GC is incredible for rapid development, but constant HTTP buffer allocations will always be its Achilles' heel at scale.
That’s a great observation about the Cloudflare transition. You hit the nail on the head—the GC overhead in Go is fantastic for velocity, but when you're dealing with high-frequency HTTP buffer allocations at the edge, that 'Achilles' heel' becomes impossible to ignore. Rust’s ownership model changes the game entirely for those layers.
"First time you see an Envoy config file you think someone is hazing you."
I felt this in my soul. Moving away from Traefik’s magic annotations into the raw, explicit world of Envoy is a rite of passage. Kudos to you and your co-engineer for surviving those week-one outages and getting it stable!
Haha, glad I’m not the only one who felt like it was hazing! Moving from the 'magic' of annotations to the explicit configuration of Envoy is definitely a rite of passage. Those first few outages were stressful, but they were the best learning experience I’ve had in a long time. Thanks for the kind words!
This is one of the most balanced "Go vs Rust" perspectives I've read. Usually, these articles devolve into tribalism, but you nailed the nuance: Go won the cloud-native revolution because of its concurrency and readability, while Rust is claiming the performance-critical substrate. Excellent breakdown of the memory lifecycle differences.
Thank you! I really wanted to avoid the 'my language is better than yours' debate. At the end of the day, both Go and Rust are phenomenal tools—it’s all about understanding the memory lifecycle and knowing where to apply them. Glad the nuance came through!
As a CFO, I love that ending, haha! "I learned a new programming language" is the ultimate engineering mic drop. Seriously though, dropping the node requirement from multiple t3.large instances down to a single t3.small while flattening the CPU spikes is a masterclass in modern cost optimization.
I’m glad a CFO perspective approves! It’s one thing to talk about performance, but when you can show a massive reduction in the AWS bill while simultaneously flattening those CPU spikes, it’s hard to argue with the results. It was definitely a fun 'mic drop' moment for the team
I’m curious about the trade-offs your team experienced during the rewrite. Specifically, how did you find the ecosystem maturity for K8s tooling in Rust (like kube-rs or custom async runtimes) compared to the battle-tested Go control-plane ecosystem? Also, how are you handling the increased complexity of the codebase for day-to-day maintenance now?
It’s always fascinating to see Rust’s memory efficiency and predictable performance (no GC pauses) yield such massive infrastructure savings when replacing Go in high-throughput network applications. The transition from Go to Rust for an ingress controller makes a ton of sense given how critical low latency and minimal resource footprints are at that layer.
While Go is usually the default for the cloud-native ecosystem due to its low concurrency overhead, your results highlight exactly where it hits its limits—garbage collection pauses and memory footprints under heavy, sustained network I/O. Dropping the GC overhead entirely by moving to Rust clearly paid off here.
Fantastic write-up! That drop from $4,200 to $390 is a massive win, and your breakdown of why it happens (Go's GC latency optimization vs. Rust's deterministic compile-time memory management for proxy allocation lifecycles) is spot on.
It’s refreshing to see a nuanced take that doesn’t just blindly bash Go, but instead highlights the right tool for the right job—Go for orchestrating the control plane, and Rust/C++ for the data plane substrate. Envoy configuration definitely feels like ritual hazing the first time around, but those flat memory metrics make the archaeology completely worth it. Thanks for sharing this!
Thanks, Tahir! I really appreciate the detailed feedback. You summarized the trade-off perfectly—using Go for the control plane and Rust for the high-performance data plane really is the 'sweet spot' for modern architecture. And yes, the archaeology of the Envoy config file is painful, but those flat memory metrics make the headache worth it every single time!
How did the team find the learning curve going from Go to Rust?