Daya Shankar

Posted on Jun 29

Kubernetes Networking for High-Performance Computing: CNI, Overlays, and Latency

#kubernetes

One of the most frustrating Kubernetes performance investigations I've worked on started with a simple assumption: the cluster needed more compute. Additional nodes were added, more powerful processors were introduced, and GPU capacity was expanded, yet application performance barely improved. At first, the numbers didn't make sense because CPU utilization wasn't saturated, memory wasn't constrained, and storage performance looked healthy. The infrastructure appeared to have everything the workload needed.

The bottleneck turned out to be neither compute nor storage. It was the networking layer connecting the workloads together.

What made the issue difficult to identify was that Kubernetes appeared to be functioning perfectly. Pods were healthy, services were reachable, and the cluster looked stable from an operational perspective. The problem only became visible when I started examining how frequently workloads were communicating with each other and how much time was being spent moving data between nodes.

I've seen variations of this problem repeatedly in high-performance computing environments. Teams spend significant time evaluating processors, GPUs, storage systems, and autoscaling policies because those resources are visible and easy to measure. Networking often receives attention much later, usually after performance improvements begin producing diminishing returns. By that stage, the workload is no longer limited by how quickly individual nodes process information. It is limited by how efficiently those nodes exchange information with each other.

That distinction is important because many high-performance workloads spend almost as much time communicating as they do computing. Once that happens, networking stops being background infrastructure and becomes part of the application's performance profile.

Why HPC Workloads Expose Networking Problems Faster

One thing I've learned over the years is that not every Kubernetes workload experiences networking in the same way. A typical web application may tolerate a few additional milliseconds of latency without creating a meaningful impact on user experience because most requests are relatively independent. The majority of processing time is often spent inside the application itself rather than moving data between services.

High-performance computing environments operate very differently.

When I'm evaluating HPC workloads, one of the first things I look at is communication behavior because performance is often determined as much by data movement as by compute power. Distributed simulations, scientific computing applications, MPI-based clusters, large-scale analytics platforms, and GPU-powered computing environments all share a similar characteristic: they constantly exchange information between nodes.

Common examples include:

Distributed simulation environments

Scientific computing applications

MPI clusters

Large-scale analytics workloads

GPU computing clusters

Parallel processing frameworks

What I've noticed is that these workloads spend a substantial portion of their execution time synchronizing processes, exchanging datasets, coordinating distributed operations, or moving intermediate results across the network. Whether it's an MPI simulation distributing tasks across dozens of nodes or a GPU cluster synchronizing workloads between accelerators, communication becomes a critical part of execution.

This is why networking bottlenecks tend to appear much earlier in HPC environments than in traditional Kubernetes deployments. Small inefficiencies that go unnoticed in web applications become measurable performance constraints when multiplied across thousands or millions of communications. A few microseconds here and a few milliseconds there may seem insignificant in isolation, but over time they directly affect throughput, scalability, and execution times.

Once I understand how heavily a workload depends on communication, the networking architecture becomes far easier to evaluate.

Why the CNI Matters More Than Most Teams Realize

When I'm troubleshooting networking performance, the Container Network Interface (CNI) is one of the first components I examine because every packet entering or leaving a pod is influenced by decisions made at this layer. Over the years, I've found that many teams choose a CNI during cluster deployment and rarely revisit the decision. That approach works perfectly well until networking performance becomes part of application performance.

The CNI determines how pods connect to the network, how traffic moves throughout the cluster, and how networking policies are enforced. In many environments, those details remain largely invisible. In HPC environments, however, they can directly influence latency and communication efficiency.

Some of the most common options include:

CNI	Common Characteristics
Calico	Routing-focused, scalable, policy-rich
Cilium	eBPF-based networking and security
Flannel	Simplicity and ease of deployment
Weave Net	Overlay-focused networking
Multus	Multiple network interfaces per pod

What I've learned is that every CNI makes trade-offs. Some prioritize simplicity and operational ease. Others emphasize policy enforcement, observability, advanced routing capabilities, or performance optimization. The right choice depends entirely on workload requirements.

This is why I rarely think about CNI selection as a networking decision alone. In high-performance environments, it is fundamentally a workload decision because the networking model influences how efficiently applications communicate.

Where Latency Actually Comes From

One misconception I encounter frequently is the belief that latency is a single problem with a single cause. In practice, latency is usually the cumulative result of multiple infrastructure decisions working together.

When a packet moves between two Kubernetes workloads, it may pass through several layers before reaching its destination:

Overlay encapsulation

Routing decisions

Network policy enforcement

Service proxying

Node-to-node communication

Physical network infrastructure

Individually, each layer introduces only a small amount of overhead. Collectively, those delays can become significant for workloads that exchange large amounts of data or require constant synchronization.

What I've learned is that overlay networking is often where the trade-off between operational simplicity and performance becomes most visible. Overlay networks create virtual networking layers on top of the physical infrastructure, making clusters easier to deploy and manage while abstracting many networking complexities. For general-purpose Kubernetes environments, that trade-off often makes perfect sense.

Communication-heavy workloads tend to expose the cost of that abstraction more quickly.

A simplified comparison looks like this:

Networking Model	Characteristics
Overlay Network	Greater abstraction, easier deployment
Underlay Network	Direct infrastructure connectivity, lower overhead

This does not mean overlay networking is inherently wrong. I've seen many production environments operate successfully using overlay architectures. The important point is understanding the trade-off. Overlay networking often improves flexibility and operational simplicity. Underlay approaches frequently reduce overhead and improve communication efficiency.

One of the most common mistakes I've seen is selecting a networking model before understanding workload communication patterns. When that happens, networking decisions are driven by deployment convenience rather than application performance requirements.

How I Evaluate Kubernetes Networking for HPC

When I'm designing Kubernetes networking for performance-sensitive environments, I spend far less time comparing feature lists and far more time understanding how workloads communicate.

The questions I usually ask are:

How frequently do workloads communicate?

How sensitive are they to latency?

How much east-west traffic exists between nodes?

Are workloads exchanging large datasets or small messages?

Do they require direct network access?

Is operational simplicity more important than performance?

The answers usually determine where optimization efforts should focus.

I've found that many networking debates disappear once communication patterns become clear. Some workloads benefit significantly from networking optimizations because communication directly affects execution time. Others see little measurable improvement because networking is not their primary constraint.

This is one reason networking has become a much larger consideration in modern HPC environments. Running distributed analytics platforms, simulation workloads, and GPU-powered computing clusters requires more than fast processors and powerful accelerators. It requires infrastructure where cloud networking performance scales alongside compute performance. Cloud platforms such as AceCloud support high-performance Kubernetes deployments because improving CPUs or GPUs alone rarely solves communication bottlenecks. If data cannot move efficiently between workloads, additional compute resources often deliver diminishing returns.

The most effective environments I've worked with do not treat networking as supporting infrastructure. They treat it as a performance-critical component of the workload itself.

The longer I work with Kubernetes, the less I think about networking as a connectivity problem and the more I think about it as a performance problem.

Most clusters can successfully connect pods and services. The real question is whether they can do so efficiently enough for the workloads they support. Traditional application environments may never expose networking bottlenecks because communication patterns are relatively simple. High-performance computing environments are different. They amplify every inefficiency because workloads spend so much time exchanging information across the network.

What I've learned is that networking performance is rarely determined by a single technology choice. It is the combined result of workload communication patterns, CNI architecture, overlay design, latency characteristics, and infrastructure decisions made throughout the stack. When those pieces align, applications scale more predictably and infrastructure investments produce the performance gains teams expect.

In my experience, the most successful Kubernetes networking strategies begin with understanding how workloads communicate. Once that foundation exists, decisions around CNIs, overlays, and latency optimization become far easier to make and far more likely to deliver meaningful results.