Running go test -race in Production: Sampling Strategies That Don't Kill Throughput

#go #concurrency #testing #performance

Book: The Complete Guide to Go Programming
Also by me: Thinking in Go (2-book series) — Complete Guide to Go Programming + Hexagonal Architecture in Go
My project: Hermes IDE | GitHub — an IDE for developers who ship with Claude Code and other AI coding tools
Me: xgabriel.com | GitHub

A team I know had a Go service that passed go test -race
on every PR for two years. Coverage was good. The CI step was
green every Friday. Then a customer reported corrupted JSON in a
small fraction of responses. Not enough to alert. Enough to
break a downstream pipeline that was strict about field types.

The race was in a handler that built a response by writing into a
shared *bytes.Buffer from two goroutines. The unit tests never
hit the path under enough load to flip the bits. The integration
tests did not run with -race. CI ran with -race but at a
fraction of production throughput — the scenario the Go docs
warn about. The detector needs the actual interleaving to fire.
No interleaving, no detection, no bug report.

This is the part of the race detector nobody writes about. It is
a runtime tool, not a static checker. It finds the races that
actually happen during the run you observe. If your tests do
not exercise the same goroutine concurrency that prod does, you
ship races that pass go test -race and break under real load.
The official Go documentation says the same thing in plainer
words: "the more code is exercised, the better chance of finding
races".

The obvious fix is to run -race against production traffic. The
obvious problem is that the race detector is expensive. The Go
team's own number, from the same docs,
is "the cost of race detection varies by program, but for a
typical program, memory usage may increase by 5-10x and execution
time by 2-20x." Translate that to the autoscaler and you find out
the experiment ate your error budget by lunch.

Three sampling strategies make this workable. None of them are
clever. All of them are easier than the consequence of finding
the next race in production logs.

Strategy 1: a `-race` canary at 1-5% of traffic

Build two binaries. Ship the normal one to the bulk of your
fleet. Ship the -race build to a small canary that takes a
fraction of traffic at the load balancer.

# Build a -race binary alongside the normal one.
# The racecanary tag is a source-side guard you control —
# it gates extra runtime assertions or canary-only telemetry
# inside the source via //go:build racecanary, so an
# accidental shipped binary cannot pretend it was the canary.
go build -tags=racecanary -o bin/api ./cmd/api
go build -race -tags=racecanary -o bin/api-race ./cmd/api

The racecanary build tag gates source files (or branches
inside them) marked //go:build racecanary — typically extra
runtime assertions, looser invariants, or canary-only telemetry
that you only want on the race build. If you do not have any
such code, drop the tag and let -race itself be the marker.

The race build's binary is bigger and slower at startup, so do
not run health checks with the same timeout you use for the
normal build. The Go runtime records race reports through
runtime/race,
which writes the report and then crashes the goroutine by
default. You want that crash. You do not want it on 100% of
traffic.

A Kubernetes-shaped layout that gets you there:

# api-deployment.yaml — the bulk of the fleet
apiVersion: apps/v1
kind: Deployment
metadata:
  name: api
spec:
  replicas: 20
  template:
    spec:
      containers:
        - name: api
          image: registry/api:v1.42.0
          resources:
            requests: { cpu: "500m", memory: "512Mi" }
            limits:   { cpu: "2",    memory: "2Gi" }
---
# api-race-deployment.yaml — the canary
apiVersion: apps/v1
kind: Deployment
metadata:
  name: api-race
  labels:
    app: api
    variant: race
spec:
  replicas: 1
  template:
    spec:
      containers:
        - name: api
          image: registry/api:v1.42.0-race
          env:
            - name: GORACE
              value: "halt_on_error=0 log_path=/var/log/race"
          resources:
            requests: { cpu: "2",   memory: "4Gi" }
            limits:   { cpu: "4",   memory: "8Gi" }

Two pods out of twenty-one is roughly 5% of traffic if the
service load-balances by pod. Move it to one pod out of forty
for 2.5%. The GORACE env var is the runtime knob.
halt_on_error=0 keeps the process alive after a report so you
collect more than one finding per deploy; log_path redirects
reports to a file the log shipper can pick up. The full list of
GORACE options is in runtime/race/doc.go.

Two things to watch. First, the canary will report higher p99
latency than the rest of the fleet. That is expected, not an
SLO event. Tag the canary in your observability stack and
exclude it from the SLO window. Second, the crash from a race
report is real; the canary will get OOMKilled or exit. The
replica count keeps it cycling. Treat each crash as a finding
and read the report before letting the pod restart pull in
another one.

Strategy 2: a `-race` build in a load-test mirror

The CI run is too small to find races. Production is too risky to
run on 100%. The middle ground is a load-test environment that
replays prod-shaped traffic against a -race build.

The shape that works is shadow traffic. The production load
balancer mirrors a copy of every request to the load-test
environment. The mirrored requests do not return a response to
the user; they exercise the code path. If a race fires in the
mirror, the report lands in the mirror's logs and prod is
untouched.

package shadow

import (
    "bytes"
    "io"
    "net/http"
)

// MirrorTo wraps a handler and forwards a copy of the request
// to a shadow URL. The shadow response is read and dropped.
// Errors from the shadow side never propagate to the caller.
func MirrorTo(shadow string, h http.Handler) http.Handler {
    client := &http.Client{}
    return http.HandlerFunc(func(
        w http.ResponseWriter, r *http.Request,
    ) {
        body, err := io.ReadAll(r.Body)
        if err != nil {
            h.ServeHTTP(w, r)
            return
        }
        r.Body = io.NopCloser(bytes.NewReader(body))
        hdr := r.Header.Clone()

        go func() {
            req, err := http.NewRequest(
                r.Method, shadow+r.URL.Path,
                bytes.NewReader(body),
            )
            if err != nil {
                return
            }
            req.Header = hdr
            resp, err := client.Do(req)
            if err != nil {
                return
            }
            io.Copy(io.Discard, resp.Body)
            resp.Body.Close()
        }()

        h.ServeHTTP(w, r)
    })
}

A couple of gotchas. The shadow client runs in its own goroutine
so the real handler returns at the same latency it would have
without the mirror. The user does not pay for the experiment.
The header is cloned before the go func block because the
original handler may mutate r.Header while the goroutine is
still reading it. And the shadow side has to be idempotent or
read-only; sending mirrored writes to a real database doubles
your write traffic. Most teams point the shadow at a clone of
the database, or stub the write path entirely.

The advantage over the canary in Strategy 1 is that the shadow
takes 100% of mirrored traffic, which gives the race detector
the highest chance of finding the bug. The disadvantage is the
cost of running the second environment. For services where one
extra pod is acceptable but a doubled fleet is not, run the
shadow on weeknights or during a controlled load test rather
than 24/7.

Strategy 3: `-race` on the background-job pool

Background workers (cron jobs, queue consumers, ETL) are the
cheapest place to keep -race on permanently. They run on their
own pool, they are not in the request path, and the latency cost
of a 2-20x slowdown shows up only as longer queue drain time, not
as a customer-visible p99 regression.

The same code paths a queue consumer hits are usually shared with
the request handlers. A race that fires in the consumer is a race
the handlers also have. The consumer is just easier to slow down
without anyone noticing.

# Dockerfile.worker — the background worker image
FROM golang:1.24 AS build
WORKDIR /src
COPY . .
RUN go build -race -o /out/worker ./cmd/worker

FROM gcr.io/distroless/cc
COPY --from=build /out/worker /worker
ENV GORACE="halt_on_error=0 log_path=/var/log/race history_size=0"
ENTRYPOINT ["/worker"]

history_size is the knob that controls how much memory the
detector uses for goroutine event history. The valid range is
0-7; the default is 1, and the buffer size is 32K * 2**history_size
per goroutine, per the runtime/race
documentation.
Lower values trade detection sensitivity for less RAM. On a
worker pool that already eats RAM, dropping to 0 halves the
per-goroutine bookkeeping at the cost of catching only more
recent interleavings.

The other thing that matters here is the queue depth. If your
workers were already at 80% of throughput before -race, the
race build will fall behind. Do one of two things: scale the
worker pool by 3-5x while the race build is on, or run a
non-race fleet alongside that handles the bulk of jobs and a
race fleet that processes a sample (say, every tenth message).
Both work. The scale-out is simpler.

What `-race` actually catches, and what it does not

The detector finds data races on memory locations. Two
goroutines accessing the same variable, at least one writing,
without a happens-before relationship between them. The
ThreadSanitizer manual
that the Go race detector is built on top of describes the
algorithm. The runtime tracks "epochs" per goroutine and per
memory location, and a write that does not happen-after a prior
access fires a report.

What it does not find:

Logic races. Two operations on the same database row without a transaction; a check-then-act on a Redis key. The detector cannot see anything outside the Go process.
Deadlocks. A goroutine waiting on a mutex another goroutine will never release. go test has its own deadlock detection, but the race detector is not it.
Channel-based races that are not data races. A send-after-close on a channel panics; that is a runtime check, not a race report.
Races that never interleave during the run. This is the one that bit the team in the opening story. The detector is observational. No traffic, no observation, no finding.

Cgo is where the detector gets unreliable. Cgo callbacks that
touch Go memory from C threads can confuse the detector if the
C side does its own synchronisation that the Go runtime does
not see. The Go cgo race detector
notes
spell out which patterns work. Syscalls that block in C without
releasing the goroutine through runtime.cgocall will sometimes
be flagged too. If you are mostly pure Go, neither matters; if
you are heavy on cgo, expect to hand-annotate.

You will know it is working when you start seeing race reports
land in the canary log. The first few will look like bugs you
should have caught in CI. They are. CI did not have the traffic.
The canary does.

If this was useful

The race detector, the runtime scheduler, the sync package,
and the GC all live one floor below the Go you write every day.
The Complete Guide to Go Programming walks through that floor
end to end: what runtime.race tracks, why a happens-before
edge fixes a report, and how the scheduler decides which
goroutine the detector watches next. Hexagonal Architecture in
Go is the layer above: structuring the service so the hot
paths the detector sniffs are isolated from the domain code.