Syed Ahmer Shah

Posted on May 19

Swapping Go for Rust: 10x Cheaper K8s Ingress

#go #rust #docker #kubernetes

The engineering cost of cloud savings

Let me tell you a story that starts in 2013, peaks somewhere around 2019, and ends with me staring at a $4,200 AWS bill at 11pm on a Tuesday.

The Revolution Nobody Saw Coming

March 2013. PyCon. A French developer named Solomon Hykes gets on stage and does a five-minute demo of something called Docker.

The audience is confused. Then curious. Then the applause starts.

In those five minutes, he essentially broke software deployment as everyone knew it. No more "works on my machine." No more environment hell. You take your app, you put it in a box, and that box runs anywhere. Same box. Every time.

The internet lost its mind.

Within a year, every startup was containerizing everything. Within two, enterprises were asking their CTOs why they weren't doing it yet. Within three, Docker was valued at over a billion dollars and "containerization" stopped being a niche word.

But here's the thing about revolutions — they create new problems.

Enter: The Orchestration Wars

By 2014 you had thousands of containers. Maybe tens of thousands. And now what? How do you run them? How do you restart the ones that crash? How do you roll out updates without downtime? How do you route traffic between them?

This is when it got messy. Genuinely messy.

Three players showed up:

Docker Swarm — Docker's own answer. Simple. Integrated. And honestly? Pretty good. But Docker the company was making decisions that confused everyone. They kept pivoting. Swarm never got the trust it deserved.

Apache Mesos — the enterprise option. Heavy. Complex. Battle-tested at Twitter and Airbnb scale. But you needed a PhD to configure it and the learning curve was basically a vertical wall.

Kubernetes — Google's open-source version of Borg, the internal system they'd been running for over a decade. K8s dropped in 2014 and it was ugly at first. The config was verbose, the concepts were alien, the documentation was written by people who already understood it.

But then something happened. Google poured resources into it. A foundation formed around it (CNCF — Cloud Native Computing Foundation, 2015). AWS, Azure, GCP all built managed versions of it. The ecosystem exploded.

By 2017 the wars were effectively over.

Kubernetes won. Swarm is still technically alive but nobody talks about it at conferences anymore. Mesos is mostly a ghost. K8s became the operating system of the cloud.

Go's Golden Era

Here's something worth understanding: Kubernetes is written in Go. Docker was written in Go. Traefik, Prometheus, Terraform, Consul, etcd, Helm — basically everything that runs your infrastructure today traces back to Go.

This wasn't an accident.

Go was designed at Google in 2007 by some of the most legendary people in computer science — Rob Pike, Ken Thompson (yes, the Unix guy), Robert Griesemer. They wanted a language that compiled fast, ran fast, handled concurrency natively, and didn't make you want to throw your laptop through a window.

They got it right.

Go's goroutines made writing networked, concurrent services stupid easy. The toolchain was clean. The standard library was rich. You could hire people who knew it. Everything clicked.

The cloud-native world standardized on Go almost by accident. It was the right language, at the right time, for the right problems.

Between 2015 and 2020, if you were building infrastructure tooling and not using Go, people looked at you sideways.

Meanwhile, In a Mozilla Office Somewhere...

A Mozilla employee named Graydon Hoare is working on something in his spare time. A new systems language. One that gives you C-level performance but makes memory corruption structurally impossible.

No garbage collector. No runtime overhead. But also no dangling pointers, no buffer overflows, no use-after-free bugs — the entire class of vulnerabilities that has haunted C and C++ for fifty years.

The trick? A concept called ownership. Every piece of memory has exactly one owner. When the owner goes out of scope, the memory is freed. Not by a GC running in the background. Not "eventually." Immediately. Deterministically. At compile time.

Mozilla ships Rust 1.0 in May 2015. The initial reaction from the systems programming community was somewhere between skeptical and hostile. The borrow checker — the compiler mechanism that enforces ownership — felt like fighting the language more than writing it.

People tried it, got confused, went back to C++ or Go.

But a small group of people got it. And they kept building. Quietly.

The Quiet Takeover

Cloudflare has a problem.

They're running nginx. Everywhere. nginx is written in C. It's fast, it's reliable, but it's also showing its age — the architecture doesn't handle modern HTTP patterns well, extending it is painful, and the codebase has had its share of security issues because, well, it's C.

Cloudflare decides to replace it. Not patch it. Replace it.

They write Pingora in Rust. A full HTTP proxy, from scratch, handling 1 trillion requests per day in production.

The numbers they published were obscene. Compared to nginx: 70% reduction in CPU. 67% reduction in memory. Near-zero security vulnerabilities because the entire class of memory bugs simply cannot exist in safe Rust.

And they didn't stop there. By late 2025, Cloudflare went further — they rewrote FL, the actual "brain" of Cloudflare (the system that applies every customer's WAF rules, DDoS settings, and routing logic) in Rust, calling it FL2. The result: response time dropped by 10ms and overall performance went up 25%. They shut down FL1 entirely in early 2026.

Then AWS announced they were writing parts of their virtualization layer in Rust. Then Microsoft said Rust was the preferred language for new systems code at the company. Then the Linux kernel — the Linux kernel, which has run on C for thirty years — accepted Rust as a second language, with stable driver support landing in kernel 6.1.

The systems programming community quietly started paying attention.

And then the Go community started getting uncomfortable.

Back To My Tuesday Night

This is where I come back in.

We were a mid-size startup. Not Cloudflare. Not Amazon. But we were scaling, traffic was growing, and our Kubernetes setup was running fine.

That word again. Fine.

We were using Traefik as our ingress controller. Go-based, battle-tested, the default choice for half the K8s tutorials on the internet. Three replicas, handling maybe 800 req/s at peak.

I was not watching the AWS bill closely enough. That was my fault.

Then I did. And I saw $4,200 for one month of ingress-related compute.

I didn't say anything for thirty seconds. I just sat there. That specific feeling when a number doesn't match your mental model and your brain is trying to reconcile it.

The Panic Phase

I started Googling. You should never Google things when you're panicking. You come out two hours later having read about eBPF, service meshes, and someone's confident blog post from 2021 recommending something that got deprecated in 2023.

What I found when I actually focused: the memory profile of our ingress layer was absurd for what we were asking it to do.

Each Traefik pod idling at ~180MB RSS. Under load? Comfortably 400MB+. That's not Traefik being badly written — it isn't. It's Go's garbage collector doing what it's designed to do. The GC optimizes for latency, not memory. It keeps things around, waits for a good time to clean up, and "a good time" under constant HTTP load basically never comes.

We had 3 pods × 400MB × 2 t3.large nodes (because they needed the headroom) running 24/7.

Do the math on AWS pricing. Then cry.

Then I found the Pingora post. Read the whole thing this time. Pulled the benchmark numbers. Understood what was actually happening at the memory level in Rust vs Go.

Something clicked.

Why Rust Hits Different Here

Nobody says this clearly enough so I will:

For most application code, Go's GC is fine. Genuinely fine.

Writing a web API? Go. Building a CLI tool? Go. Microservices with moderate traffic? Go, easily.

But a proxy is a different beast. A proxy's entire job is to sit between two things and move bytes from one to the other, as fast as possible, thousands of times per second. Every request means allocations — headers, buffers, connection state. Every one of those allocations eventually needs to be freed.

In Go, "eventually" is doing a lot of work in that sentence.

In Rust, memory is freed the moment it goes out of scope. Not by a background process. Not on a schedule. Structurally, at the language level. The compiler guarantees it.

For a proxy, that's not a nice-to-have. It's the difference between flat memory usage and a profile that climbs under load and triggers GC pauses at exactly the wrong moment.

linkerd2-proxy — the data plane for the Linkerd service mesh — is written in Rust for exactly this reason. It's running in production service meshes at serious scale. Tokio — Rust's async runtime — is mature enough that they're running their own conference in 2026. The ecosystem isn't a science experiment anymore.

The Migration (It Was Not Smooth, I Won't Lie)

We didn't rewrite Traefik. We're not a systems programming shop.

We moved to Envoy as the core proxy (C++, but with a very different memory model than Go's runtime, and a mature extensibility story) plus a small Rust service handling our custom routing logic that Traefik was previously doing in middleware.

Week one: three outages.

Not because Rust is hard (our Rust service was fine). Because Envoy's config model is completely different from Traefik's and we were copying assumptions over like fools. Traefik does a lot of magic via annotations. Envoy does nothing by magic. Everything explicit. Everything verbose. First time you see an Envoy config file you think someone is hazing you.

Week two: stable. Genuinely stable. We kept waiting for something to explode. Nothing did.

Week three: I pulled up the memory dashboards.

I called my co-engineer over. We both stared at it for a moment.

The Numbers (This Is The Part You Came For)

Before:

3x Traefik pods
~380MB average RSS each
CPU spikes during GC under high traffic
2x t3.large nodes just for ingress (2 vCPU, 8GB each)

After:

2x Envoy pods + Rust routing layer
~40MB average RSS each
CPU: completely flat. No spikes. No GC sweeps. Nothing.
1x t3.small node handles it without breaking a sweat

Node costs alone: from ~$340/month to ~$30/month.

Factor in traffic processing, data transfer handling, reduced overhead across the board — the full picture came out to roughly 10x cheaper.

Not 10% cheaper. Not 2x. Ten times.

The title isn't clickbait. I was as shocked as you are right now.

Should You Actually Do This?

Real answer: probably not yet, unless you're bleeding money.

Operational complexity is a real cost. This migration took two engineers about three weeks including the debugging nights and the config archaeology. If Traefik is working for you and your AWS bill isn't making you spiral, leave it alone. Boring infrastructure is good infrastructure.

But if your ingress layer has its own line item on your cost explorer — and you feel a specific kind of shame looking at it — the tooling exists, it's production-proven, and the math is not subtle.

The Rust networking ecosystem in 2026 is not the Rust networking ecosystem of 2018. Cloudflare has replaced their entire request-handling stack with Rust. Pingora is open-source and now at v0.7.0 — it's evolved from "proxy" into what people are calling programmable network infrastructure. linkerd2-proxy has been running in production service meshes for years. Tokio is the undisputed async runtime of the ecosystem. The scary part is mostly gone. What's left is just learning.

The Actual Lesson (And It's Not About Languages)

Go didn't lose. This wasn't a rivalry with a winner.

Go won the cloud infrastructure era because it was exactly right for that moment — concurrency primitives, fast compilation, readable code, a great standard library. Kubernetes exists because of Go. The cloud-native ecosystem exists because of Go.

But Rust is winning the next layer — the performance-critical substrate that the rest of the infrastructure runs on. Proxies. Kernels. Network dataplanes. Places where a GC isn't a tradeoff you can afford because memory is money and latency is user experience.

Both can be true. A hammer is the right tool for nails. A scalpel is the right tool for surgery. Neither replaces the other.

That $4,200 bill became $390.

The CFO asked me what changed.

I said I learned a new programming language.

He nodded like he understood and immediately changed the subject.

That's fine. The bill was paid. The graphs were flat. The nodes were small.

Sometimes the best infrastructure is the one nobody notices.

Find Me Across the Web

✍️ Medium: @syedahmershah
💬 DEV.to: @syedahmershah
🧠 Hashnode: @syedahmershah
💻 GitHub: @ahmershahdev
🔗 LinkedIn: Syed Ahmer Shah
🧭 All links: Beacons
🌐 Portfolio: ahmershah.dev

Top comments (64)

Syed Ahmer Shah • May 19

Thank you! I really wanted to avoid the 'my language is better than yours' debate. At the end of the day, both Go and Rust are phenomenal tools—it’s all about understanding the memory lifecycle and knowing where to apply them. Glad the nuance came through!

Ronan • May 24

Your technical assessment of Go's garbage collection vs. Rust's compile-time memory management is spot on. Go is brilliant for control plane velocity, but continuous HTTP buffer allocations at high frequencies inevitably introduce an infrastructure tax. Using Rust strictly for the critical data plane substrate is the right architectural choice.

Syed Ahmer Shah • May 24

Exactly, Ronan. "Infrastructure tax" is the perfect way to phrase it. Go’s control plane velocity is unmatched for getting things out the door, but when you are pumping high-frequency HTTP buffers through an ingress, those micro-allocations scale up your cloud bill fast. Splitting the architecture to let Rust do the heavy lifting on the data plane while keeping Go where it thrives was the sweet spot. Really appreciate your sharp breakdown here!

Omar Hurain • May 24

This is one of the most balanced perspectives on the Go vs. Rust debate. It completely avoids developer tribalism by recognizing the exact boundaries where each tool shines. Acknowledging that "boring infrastructure is good infrastructure" shows great maturity—optimizing only where the high-throughput proxy bottleneck demands it.

Syed Ahmer Shah • May 24

Thanks, Omar! You nailed exactly what I was hoping to get across. It’s so easy to fall into the "this language is better than that one" trap, but engineering is just about trade-offs. Go is incredible for 90% of what we build, but when you're hitting that specific proxy bottleneck, the GC tax becomes a real financial metric. Keeping the infrastructure "boring" everywhere else is what allowed us to spend those 3 weeks focusing purely on the data plane. Appreciate you reading and bringing out that specific takeaway!

Faique • May 19

"First time you see an Envoy config file you think someone is hazing you."

I felt this in my soul. Moving away from Traefik’s magic annotations into the raw, explicit world of Envoy is a rite of passage. Kudos to you and your co-engineer for surviving those week-one outages and getting it stable!

Syed Ahmer Shah • May 19

Haha, glad I’m not the only one who felt like it was hazing! Moving from the 'magic' of annotations to the explicit configuration of Envoy is definitely a rite of passage. Those first few outages were stressful, but they were the best learning experience I’ve had in a long time. Thanks for the kind words!

Faraz • May 19

"Boring infrastructure is good infrastructure." Words to live by! It’s easy to get sucked into rewriting everything for the sake of hype, but your point about operational complexity is crucial. 3 weeks of engineering time for 2 people is a real cost, but with a 10x savings on a $4,200/month recurring bill, your ROI hit break-even almost immediately. Great execution.

Syed Ahmer Shah • May 19

Exactly! It’s easy to fall into the trap of 'resume-driven development,' but sometimes the most impressive engineering is the kind that just quietly works and saves the company money. Glad you appreciated the ROI breakdown—it's definitely satisfying to see the infrastructure pay for itself so quickly!

Vinod Oad • May 19

The timing of this is perfect. Seeing Cloudflare completely phase out FL1 for their Rust-based FL2 earlier this year really proved that this isn't just a niche optimization anymore—it's the new standard for edge and proxy layers. Go's GC is incredible for rapid development, but constant HTTP buffer allocations will always be its Achilles' heel at scale.

Syed Ahmer Shah • May 19

That’s a great observation about the Cloudflare transition. You hit the nail on the head—the GC overhead in Go is fantastic for velocity, but when you're dealing with high-frequency HTTP buffer allocations at the edge, that 'Achilles' heel' becomes impossible to ignore. Rust’s ownership model changes the game entirely for those layers.

mote • May 23

Ran into this exact problem on my drone's obstacle avoidance system — the agent kept "forgetting" low-priority sensor streams when GPU memory got tight, which is basically what you're describing with token budget pressure.

The real issue isn't just context length, it's prioritization. When you're running a local model on constrained hardware, you need to decide what stays in memory and what gets evicted. Most frameworks don't give you that control.

Curious how you'd handle multi-modal prioritization — like if you have 5 different sensor feeds but can only afford context for 3, how do you decide which ones to compress or drop?

Syed Ahmer Shah • May 23

You hit the exact core of the issue: resource constraints dictate how an intelligent system perceives reality.

For multi-modal prioritization on local hardware, three approaches work well:

Dynamic Gating: Pass a tiny, hyper-fast "meta-stream" first to detect anomalies, then dynamically promote that specific feed's priority.

Semantic Layering: Compress feeds into text/vector summaries before they hit the main model. A drone doesn't need raw frames of a wall; it just needs the string "Obstacle: 1.2m".

Weighted Eviction: Treat context like a ring-buffer where safety-critical tokens have a longer TTL (Time to Live) than low-priority streams.

Most frameworks treat context as a flat sequence rather than a dynamic memory hierarchy. How are you handling the eviction on your drone right now?

Sahil Kumar • May 19

It’s always fascinating to see Rust’s memory efficiency and predictable performance (no GC pauses) yield such massive infrastructure savings when replacing Go in high-throughput network applications. The transition from Go to Rust for an ingress controller makes a ton of sense given how critical low latency and minimal resource footprints are at that layer.

Syed Ahmer Shah • May 20

It really highlights where the language design choices show up in production, Sahil.

Go’s garbage collector is fantastic for getting applications shipped fast without worrying about memory management, but high-throughput network layers are a completely different beast. When you're processing hundreds of thousands of concurrent requests, constantly allocating and freeing up network buffers causes a massive layout of heap allocations that keeps the GC permanently working overtime.

Switching to Rust's resource management strategy completely flips the script. Being able to pass data through user space with zero-copy operations and zero runtime overhead means the ingress can practically flatline both CPU usage and latency. When you aren't over-provisioning clusters just to buffer against GC spikes, those massive infrastructure savings happen naturally. 👍

Amir • May 24

The ROI breakdown here is excellent. Investing 3 weeks of engineering time for two developers is a tangible upfront cost, but dropping a recurring bill from $4,200 to $390 yields immediate break-even. Framing this through financial metrics rather than just language hype makes it a highly practical case study for modern infrastructure teams.

Syed Ahmer Shah • May 24

Thank you, Amir! I really wanted to ground this in cold, hard math rather than just language fandom. At the end of the day, an engineering manager doesn't care about memory safety hype as much as they care about dropping a recurring bill from $4.2k to under $400. Factoring in the 2-developer, 3-week salary cost made the ROI argument undeniable. Glad you appreciated the financial breakdown!

Sagar Kumar • May 19

As a CFO, I love that ending, haha! "I learned a new programming language" is the ultimate engineering mic drop. Seriously though, dropping the node requirement from multiple t3.large instances down to a single t3.small while flattening the CPU spikes is a masterclass in modern cost optimization.

Syed Ahmer Shah • May 19

I’m glad a CFO perspective approves! It’s one thing to talk about performance, but when you can show a massive reduction in the AWS bill while simultaneously flattening those CPU spikes, it’s hard to argue with the results. It was definitely a fun 'mic drop' moment for the team

View full discussion (64 comments)