Gabriel Anhaia

Posted on Apr 28

GOMAXPROCS Is Lying to Your Kubernetes Pod (Costing You CPU)

#go #kubernetes #performance #backend

Book: The Complete Guide to Go Programming
Also by me: Thinking in Go (2-book series) — Complete Guide to Go Programming + Hexagonal Architecture in Go
My project: Hermes IDE | GitHub — an IDE for developers who ship with Claude Code and other AI coding tools
Me: xgabriel.com | GitHub

You deploy a Go service to a Kubernetes cluster. The pod has resources.limits.cpu: "2". You watch the dashboard. CPU usage hovers around 40%, well under the limit. p99 latency is 800ms. You stare at the flame graph. Most of the time is spent in scheduler code. The service is mostly idle and mostly slow at the same time.

You have probably seen this exact pattern. The thing nobody told you is that your Go process thinks it has 64 cores. It is on a 64-core node. It does not know about the limits.cpu: "2" field. The Linux kernel knows. The container runtime knows. The Go runtime, before version 1.25, did not.

This is the gap that has been costing teams CPU and tail latency for years. Go 1.25 is the first version that closes it in the standard runtime, and only if you upgrade, audit your manifests, and know which two GODEBUG flags can quietly turn the fix off.

What `runtime.NumCPU()` Actually Returns

Open a Go file. Print the value:

package main

import (
    "fmt"
    "runtime"
)

func main() {
    fmt.Println("NumCPU:", runtime.NumCPU())
    fmt.Println("GOMAXPROCS:", runtime.GOMAXPROCS(0))
}

Build it. Run it on a Mac with 12 cores: NumCPU: 12. Run it on a 64-core EC2 host: NumCPU: 64. Now put the same binary in a container with --cpus=2 and run it on the same 64-core host. Before Go 1.25, you still get NumCPU: 64. The runtime reads /proc/cpuinfo (or the equivalent syscall), and /proc/cpuinfo is not container-scoped. It shows the host.

GOMAXPROCS is the number Go uses to size its scheduler. It defaults to whatever NumCPU() returns. So a service with a 2-CPU limit runs a scheduler sized for 64 logical CPUs. The runtime spins up 64 P structures (P = scheduler processor in Go's runtime), runs goroutines on OS threads bound to those Ps, and the kernel CFS scheduler enforces the actual 2-CPU quota by stopping your threads partway through every 100ms CFS period.

This is the part most engineers miss. The limit is real. The kernel enforces it. But your Go runtime is making decisions as if the limit did not exist.

Why That Costs You Latency

The Linux CFS scheduler (Completely Fair Scheduler) enforces CPU limits in 100ms CFS periods. With a 2-CPU limit, your container gets 200 CPU-milliseconds of budget every 100ms of wall time (2 cores x 100ms). Spread that across 64 runnable goroutines on 64 OS threads (one per P), and the scheduler drains the quota in a burst. The remaining wall-clock time in the period: every thread is parked, waiting for the next CFS period.

This is what Kubernetes calls CPU throttling. It shows up in container_cpu_cfs_throttled_periods_total. It does not show up in your "CPU usage" dashboard, because the dashboard averages over a longer window than the CFS period. You see 40% utilization. The kernel sees you blow the quota at millisecond 12 and sit idle until millisecond 100. Repeat ten times per second.

The user-facing effect: a request handler that should take milliseconds gets parked for the rest of the current CFS period. p99 latency explodes. Michal Drozd's writeup of this failure mode describes the same symptom: a pod showing modest average CPU usage on the dashboard while tail latency runs an order of magnitude over budget. The pod looks healthy on every metric your team probably watches.

It gets worse for garbage collection. The Go GC needs all running goroutines to reach a safepoint. If half your scheduler's threads are parked by CFS at the moment GC starts, the stop-the-world phase has to wait for those threads to be scheduled back in. That wait is unbounded by your code; it is bounded by the kernel's scheduling fairness. Uber's writeup of the problem and the original automaxprocs design describes the cascade: throttled threads delay GC signals, GC delays the whole process, the process falls behind, more goroutines pile up, more throttling.

Where the Cost Comes From

The actual cost depends on the gap between NumCPU() and your CPU limit, and on how concurrent your workload is. The cost shows up in two places: wasted CPU (you pay for cores you cannot fully use, plus scheduler overhead managing Ps that get throttled) and worse tail latency. The uber-go/automaxprocs README publishes benchmark numbers that show meaningful p50 and p99 improvements when the runtime value is matched to the cgroup limit. Same code, same load, same hardware. A different value in one runtime variable.

The reason it varies: a small gap (limit of 8 on a 16-core node) wastes less than a big gap (limit of 2 on a 64-core node). Modern bare-metal Kubernetes nodes are often 32, 64, or 96 cores, and pods with limits.cpu of 1 or 2 are common. The bigger the host, the worse the default behavior.

The Workaround Everyone Used: `uber-go/automaxprocs`

For seven years, the answer to this problem was a single import. Add go.uber.org/automaxprocs to your main package, and the library reads the cgroup files at init() time and calls runtime.GOMAXPROCS() with the correct value before your code starts.

package main

import (
    _ "go.uber.org/automaxprocs"
    // ... your other imports
)

func main() {
    // GOMAXPROCS is already correct here.
}

The blank import does the work. Inside, the library checks /sys/fs/cgroup/cpu.max (cgroup v2) or cpu.cfs_quota_us and cpu.cfs_period_us (cgroup v1), divides quota by period, rounds, and calls runtime.GOMAXPROCS(). The package documentation and the source for cpu_quota_linux.go make the calculation explicit.

Walk into almost any production Go-on-Kubernetes codebase and you will find this import. Some teams pull it in via a shared internal library. Some declare it as a direct dependency in go.mod. Some copy-paste it into a custom init package. The shape is always the same: a one-liner that the framework or platform team added once and nobody has touched since.

The problem with relying on this is that "nobody has touched it" is the failure mode. Junior engineers join the team. They start a new service. They follow the team's Go template. The template predates automaxprocs or the engineer who set it up forgot. The new service ships without it. Nobody notices for six months because, again, the dashboard looks fine.

What Go 1.25 Changed

Go 1.25, released in August 2025, made GOMAXPROCS container-aware in the standard runtime. The Go blog post on container-aware GOMAXPROCS and the Go 1.25 release notes describe the new behavior:

On Linux, the runtime reads the cgroup CPU bandwidth limit at startup.
If that limit is lower than runtime.NumCPU(), GOMAXPROCS defaults to the limit (rounded up to allow full utilization of fractional limits).
The runtime periodically rechecks and adjusts GOMAXPROCS if the limit changes (for example, if a Kubernetes admin patches the deployment).
The new defaults only apply if GOMAXPROCS is otherwise unspecified. Setting the GOMAXPROCS environment variable or calling runtime.GOMAXPROCS() explicitly disables the new behavior, exactly as before.

There is also a new function:

// runtime.SetDefaultGOMAXPROCS resets GOMAXPROCS to the runtime's
// default (the cgroup-aware value), even if it was previously set
// by the env var or a runtime.GOMAXPROCS call.
runtime.SetDefaultGOMAXPROCS()

This is a way to opt back into the default after a temporary override. Useful in testing, or in services where one code path needs to pin GOMAXPROCS and another needs to release it.

The Two GODEBUG Flags You Need to Know About

The Go 1.25 runtime exposes two GODEBUG settings that control the new behavior:

containermaxprocs=0 disables the cgroup-aware default. GOMAXPROCS falls back to NumCPU() like Go 1.24 and earlier.
updatemaxprocs=0 disables the periodic re-check. GOMAXPROCS is set once at startup and never adjusted, even if the cgroup limit changes.

Set them in your environment:

# k8s deployment env block
env:
  - name: GODEBUG
    value: "containermaxprocs=0,updatemaxprocs=0"

You probably do not want either of these set unless you have a specific reason. The reasons that come up:

You ship with automaxprocs already. The library and the runtime both try to set GOMAXPROCS. The library wins because it runs in init(), but the runtime's periodic re-check still runs. Disabling updatemaxprocs keeps the value pinned to whatever automaxprocs decided. In most cases the runtime would pick the same value, so this matters only in edge cases (see the next section).
You run on a host where the cgroup detection misbehaves. Edge-case bugs have been reported on the Go issue tracker for specific cgroup v1 setups where the runtime fails to read the container CPU limit. Until those are fixed, containermaxprocs=0 plus a manual GOMAXPROCS value is the workaround.
You explicitly want the old behavior for a benchmark or migration test.

The Upgrade Trap

Here is the part that catches teams. You upgrade from Go 1.24 to Go 1.25. You assume the runtime now handles GOMAXPROCS. You delete the automaxprocs import from your services. The next deploy goes out. Production looks fine.

A week later, one service starts seeing throttling again.

The reason: that service has GOMAXPROCS=8 set in its deployment manifest, hardcoded by someone who tuned it manually two years ago. The container has a 2-CPU limit. Go 1.25's cgroup-aware default is explicitly disabled, because GOMAXPROCS is set in the environment.

The runtime is doing exactly what the documentation says. It is also doing the wrong thing for your service, because the env var was set by a person who is no longer on the team, against a CPU limit that has been changed twice since.

The audit you have to run before relying on Go 1.25's behavior:

# Find every place GOMAXPROCS is set in your manifests.
grep -r "GOMAXPROCS" deploy/ k8s/ helm/

# Find every call to runtime.GOMAXPROCS in your code.
grep -rn "runtime.GOMAXPROCS" .

# Find every place automaxprocs is imported.
grep -rn "automaxprocs" .

For each hit, decide: is this still needed, or is it stale tuning that should be deleted? In most cases, the answer is delete. In a few cases (services with cgroup detection bugs, services with very specific scheduler requirements) the answer is keep, and document why.

Sanity-Checking the New Behavior in Production

After an upgrade, you want to confirm the runtime is making the right decision. The cheapest way is to log it on startup:

package main

import (
    "log"
    "runtime"
)

func main() {
    log.Printf(
        "startup: NumCPU=%d GOMAXPROCS=%d",
        runtime.NumCPU(),
        runtime.GOMAXPROCS(0),
    )
    // ... rest of main
}

For a pod with limits.cpu: "2" on a 64-core node, you want a line that says NumCPU=64 GOMAXPROCS=2. If you see GOMAXPROCS=64, the cgroup-aware default did not fire. The most common cause is that you are still on Go 1.24 or earlier. After that, check for a stale GOMAXPROCS env var. Then check whether your runtime is hitting one of the known cgroup v1 detection bugs.

For ongoing monitoring, the metric you actually want is container_cpu_cfs_throttled_periods_total over container_cpu_cfs_periods_total. That ratio is the share of CFS windows where your container hit its quota. After a correct GOMAXPROCS fix, this number should drop substantially on services that previously throttled.

Should You Still Use `automaxprocs`?

A reasonable question, given the runtime now does what the library used to do. The pragmatic answer:

If your production fleet is fully on Go 1.25 or newer, on Linux, and you have audited the manifests for stale GOMAXPROCS env vars, the library is no longer pulling its weight. Removing it is a small cleanup and worth doing during your next dependency review.

If any of those conditions fail (one service still on Go 1.22, cgroup v1 hosts with detection bugs, uncertain manifest hygiene) keeping automaxprocs costs you nothing and gives you consistent behavior across the fleet during the migration. The library is still maintained and still correct.

The one case where the library is strictly better than the runtime: when you want the value at init() time, before any of your other packages run. The runtime sets the cgroup-aware value before your main function runs, but the exact ordering relative to other init() functions is not something you want to depend on for code that reads runtime.GOMAXPROCS(0) during its own init.

What This Changes for New Services

If you are writing a new Go service today, on Go 1.25 or later, deploying to Kubernetes, the playbook is shorter than it has ever been:

Do not set GOMAXPROCS in your deployment env. Let the runtime decide.
Do not call runtime.GOMAXPROCS() in code unless you have a specific reason. Document the reason if you do.
Set resources.requests.cpu to your steady-state usage. Set resources.limits.cpu to your peak. The runtime will follow limits.cpu. (The runtime does not look at requests.cpu; that field affects scheduling and CFS shares, not the CFS quota.)
Log GOMAXPROCS on startup. Check it after every upgrade and every manifest change.
Watch container_cpu_cfs_throttled_periods_total. If it goes up, something has gone wrong with one of the four steps above.

This is the stack of small decisions the runtime is finally making for you, after years of Kubernetes-Go services pretending the host's CPU count was the limit. The fix is not free. You still have to upgrade, audit, and verify. But the default is finally aligned with what every container-aware Go service has needed since cgroups shipped.

The Go runtime was lying to your pod. As of 1.25, it stopped. The work left is making sure your team did too.

If this was useful

The runtime details that decide whether your Kubernetes service throttles or breathes (the scheduler, the GC's interaction with CFS, what GOMAXPROCS actually controls) are the kind of thing a Go book either skips or burns three chapters on. The Complete Guide to Go Programming is the half of Thinking in Go that covers them at the depth you need to debug production. The companion book, Hexagonal Architecture in Go, is for when the runtime is fine and the project layout is the thing slowing you down.

Thinking in Go (series): Complete Guide to Go Programming · Hexagonal Architecture in Go
Hermes IDE: hermes-ide.com — an IDE for developers shipping with Claude Code and other AI coding tools
Me: xgabriel.com | GitHub

DEV Community

GOMAXPROCS Is Lying to Your Kubernetes Pod (Costing You CPU)

What `runtime.NumCPU()` Actually Returns

Why That Costs You Latency

Where the Cost Comes From

The Workaround Everyone Used: `uber-go/automaxprocs`

What Go 1.25 Changed

The Two GODEBUG Flags You Need to Know About

The Upgrade Trap

Sanity-Checking the New Behavior in Production

Should You Still Use `automaxprocs`?

What This Changes for New Services

If this was useful

Top comments (0)

What runtime.NumCPU() Actually Returns

Why That Costs You Latency

Where the Cost Comes From

The Workaround Everyone Used: uber-go/automaxprocs

What Go 1.25 Changed

The Two GODEBUG Flags You Need to Know About

The Upgrade Trap

Sanity-Checking the New Behavior in Production

Should You Still Use automaxprocs?

What This Changes for New Services

If this was useful

What `runtime.NumCPU()` Actually Returns

The Workaround Everyone Used: `uber-go/automaxprocs`

Should You Still Use `automaxprocs`?