The Go Service That Slowed Down Without Errors

#go #performance #goroutines

There is a popular phrasing in Go circles that says concurrency is almost free. The phrasing is true at the level of stack allocation and misleading at the level of operating a service. The cheap thing is the per-goroutine memory footprint. The expensive thing is the scheduler's response to unbounded fan-out, leaked workers, and undeclared blocking — and none of those failure modes throw an error.

The shape of the failure is recognizable once you have seen it. Latency on a healthy endpoint creeps from 50 ms to 200 ms to 800 ms over the course of an afternoon. CPU climbs from 30% to 95%. The error count stays at zero. Nothing is wrong; nothing is right. The service is alive in the sense that the request handler returns. It is dead in the sense that nobody can use it.

This piece is an argument that the performance question for a Go service is rarely "is concurrency cheap" and almost always "what bounds your concurrency, and what happens when those bounds are exceeded."

What is actually cheap

A goroutine starts with a 2 KB stack. The number is set by the stackMin constant in runtime/stack.go in the Go runtime source — specifically stackMin = 2048. A POSIX thread on Linux, by contrast, defaults to between 2 and 8 MB of virtual address space for its stack — 2 MB is the per-architecture default on x86-64 when RLIMIT_STACK is unlimited, per the pthread_create man page; a typical shell with ulimit -s 8192 set propagates 8 MB into new threads, also per the man page. Either way, the comparison commonly cited — goroutines are three orders of magnitude cheaper than threads — is real arithmetic.

What the comparison hides is what the runtime does with the rest of the stack growth, with scheduling, and with the fact that you can now afford to spawn hundreds of thousands of these things on a single machine. The cost that scales with concurrency is not the stack. It is everything downstream.

Go's scheduler is described in Dmitry Vyukov's "Scalable Go Scheduler Design Doc" from May 2012, which became the work-stealing scheduler shipped in Go 1.1. The design has three letters worth knowing: M for the OS thread, P for the logical processor that owns a local run queue and an mcache, G for the goroutine. The runtime multiplexes many Gs across a small number of Ms, mediated by Ps. When a P's local run queue empties, it steals from another P's queue.

That model holds together under one assumption: the run queue does not fill up faster than the Ps can drain it. Every failure mode below is a different way of breaking that assumption.

Unbounded fan-out

The first failure shape is the easiest to write and the easiest to ship without noticing.

func handleRequest(req Request) {
    go processAsync(req)
    respondOK()
}

The handler returns in microseconds. The work happens "somewhere." Under load tests with a uniform request distribution, the pattern works fine. Under real traffic — bursty, correlated, with one slow downstream — the goroutine count walks up faster than the runtime can drain it. The visible symptoms are a climbing scheduler run queue, climbing CPU spent on context switches, and climbing latency on every endpoint sharing the runtime, including the ones that have nothing to do with the fan-out.

The diagnostic is straightforward, once you know to look for it. Setting GODEBUG=schedtrace=1000 tells the scheduler to emit a single line to standard error every 1000 ms summarizing its state, per the documented behavior in the runtime package. The line that matters is the one beginning SCHED, and the field that matters in that line is runqueue=N, the number of goroutines waiting to be picked up by some P. The interesting number is not any specific threshold; it is the trajectory. A runqueue depth that climbs and does not return to its baseline is a saturated scheduler, regardless of whether the absolute value is sixty or six thousand.

The fix is not "fewer goroutines." It is "an explicit concurrency limit." Two patterns cover almost all cases.

The first is a worker pool with a bounded queue: a fixed-size set of long-lived goroutines reading from a buffered channel. When the channel is full, Submit either blocks (push backpressure upstream) or returns an error (drop the work and report it). Either decision is a design choice. The default of "spawn another goroutine and hope" is the decision that produces the slow-down nobody can explain.

The second is golang.org/x/sync/semaphore, which gives you a weighted semaphore with a context-aware Acquire. The signature is func (s *Weighted) Acquire(ctx context.Context, n int64) error, and the documented behavior is that it blocks until resources are available or ctx is done. The pattern looks like this:

sem := semaphore.NewWeighted(10)

for _, task := range req.SubTasks {
    if err := sem.Acquire(ctx, 1); err != nil {
        return err
    }
    go func(t Task) {
        defer sem.Release(1)
        callExternalService(ctx, t)
    }(task)
}

The semaphore caps the parallelism at ten regardless of how many subtasks the request brings in. Backpressure is a property of the design, not an emergent property of the runtime giving up.

The leak you find next quarter

The second failure shape is the one that does not show up under load tests, because it does not show up under any tests that finish in seconds. It shows up at week three.

func startWorker(jobs <-chan Job) {
    go func() {
        for job := range jobs {
            process(job)
        }
    }()
}

The pattern is correct. A for ... range over a channel iterates over received values until the channel is closed, as documented in Go by Example's "Range over Channels" page. The bug is the assumption that the channel will be closed. If startWorker is called from request-handling code, and the channel is allocated per-request, the goroutine sits in chan receive forever once the request returns. Go's runtime does not garbage-collect a goroutine that is blocked on a channel — it has long been the explicit position of the language team that doing so would silently hide the bug rather than surface it (see GitHub issue golang/go#19702).

The memory cost of one leaked goroutine is small: a stack measured in single-digit kilobytes, plus whatever it has captured from its closure. The cost of ten thousand of them is not catastrophic by 2026 server standards. The cost of having to figure out which goroutines are leaked, in production, with go tool pprof http://localhost:6060/debug/pprof/goroutine and a stack trace that aliases ten thousand call sites onto five distinct frames, is measured in engineer-days.

A cheap and effective leak detector is runtime.NumGoroutine() exposed as a Prometheus gauge. The function is documented to return "the number of goroutines that currently exist." Under steady load on a service that is not leaking, the number jitters around a baseline. Under any load on a service that is leaking, it goes up and does not come down. The graph does not require an expensive observability stack to read.

For tests, the uber-go/goleak package provides goleak.VerifyNone(t) and goleak.VerifyTestMain(m), which fail the test if extra goroutines are alive at teardown. It is not a complete defense — concurrent tests confuse it, as the README acknowledges — but it raises the floor on what you have to notice yourself.

In production, Uber's LeakProf, described in a November 2022 engineering blog post, scans goroutine profiles for parked goroutines on shared channels and flags them as suspected leaks. The published numbers are striking: across the deployment described in the post, the system found ten critical leaks and produced one false positive. Two of the fixes produced 2.5× and 5× reductions in peak service memory, and one team voluntarily cut its container memory request by 25%. The post describes the typical effect not as outright crashes but as quiet degradation that the existing alerting did not name.

The blocking call you forgot to bound

The third failure shape is the simplest and the most embarrassing. A service makes outbound HTTP calls. The calls almost always finish in 5 ms. Once a quarter, an upstream gets sick, and the calls take 30 seconds.

resp, err := http.Get(url)

http.Get does not have a default timeout. The request will wait as long as the operating system's TCP stack is willing to wait, which is, for practical purposes, indefinitely. Each request handler that calls into this code path holds a goroutine for the duration. A handler that normally completes in 5 ms now holds a goroutine for 30 s; under steady traffic that is a goroutine count six thousand times larger than the steady state.

The runtime does not crash when this happens. It does the work it always does — schedule, multiplex, steal — but with six thousand times the bookkeeping, against a working set that no longer fits in cache. The visible signature is the same shape as unbounded fan-out: latency rising on endpoints that have nothing to do with the slow upstream, CPU climbing, error count flat.

The fix is two layers of timeout, on the assumption that any single layer can be forgotten or wired wrong:

ctx, cancel := context.WithTimeout(req.Context(), 2*time.Second)
defer cancel()

req2, _ := http.NewRequestWithContext(ctx, "GET", url, nil)

client := &http.Client{Timeout: 2 * time.Second}
resp, err := client.Do(req2)

The context bound is the one that propagates: callers can cancel, downstream calls inherit the deadline. The http.Client timeout is a belt-and-braces second layer that catches the case where the context is set up and then ignored by an inner library that takes a context.Background() for reasons of its own. Two layers is not paranoia; two layers is a recognition that one of them will be wrong eventually.

Why this is hard to learn from an outage

Each of the three failure shapes — fan-out, leak, blocking call — has the same outward signature: rising latency, rising CPU, flat error count. The remediation in the first hour of the incident is the same in all three cases: scale out, restart, see if it comes back. None of them tell you which of the three you had, and none of them tell you which other piece of code in the same service has the same problem latent in it. The instinct to write a runbook entry titled "if latency rises and errors stay flat, restart" is the instinct of a team that is going to have this exact incident again next month, with a different root cause, indistinguishable from the outside.

The instrumentation that disambiguates is cheap and almost always missing. runtime.NumGoroutine() as a Prometheus gauge — one line in main. net/http/pprof registered on a debug port — one blank import. GODEBUG=schedtrace=1000 set in staging during load tests — one environment variable. Each of these answers a question that is otherwise answered with grep and guesswork at three a.m.

What the cheap-goroutine claim is actually saying

"Goroutines are cheap" is a statement about the runtime's allocation strategy. It is not a statement about the runtime's tolerance for unbounded growth. The runtime is permissive: it will not refuse to create your ten-thousandth goroutine, will not warn you about your unclosed channel, will not surface your stalled HTTP call. The permissiveness is what makes the language feel productive on day one. It is also what makes the day-thirty incident hard to diagnose.

The two questions to ask before every go func() are not new, and they are not subtle. Who closes its channel, and what happens if they don't? What bounds its parallelism, and what happens when those bounds are exceeded? If you cannot answer both, you have not finished writing the function. You have written the first version of the function and let the runtime handle the rest, which is exactly the experience the runtime is engineered to give you. It is a wonderful experience. It is also, by construction, the experience that does not throw an error when the bounds are wrong.

A service that slows down without errors is not a sick service. It is a healthy service operating outside the regime its author bounded for it. The fix is not faster code; the fix is named bounds, and it costs almost nothing to write down — a semaphore, a deadline, a documented life cycle for every goroutine you spawn. The next time you write go on a line by itself, the question worth pausing on is not whether to spawn the goroutine. It is what stops it. If the answer is "the channel will be closed eventually" or "the call will return eventually," the answer is wrong, and the slow afternoon is already on the calendar. You just have not seen the date yet.