8 race conditions. That's what three months of "I'll add -race later" bought me.
The codebase is a Go backend for a freelance studio automation tool. Around 4,000 lines of application code, a handful of goroutines managing job queues, email polling, and an agent dispatch loop. Perfectly ordinary stuff. I had been telling myself -race was "too slow for CI." It runs in 11s for a 4k-line service.testing I was wrong.
What the detector actually outputs
When youtipsprogramming hit a real data race, the output looks like this:
==================
WARNING: DATA RACE
Read at 0x00c0001b4030 by goroutine 18:
github.com/baodev/flos/internal/dispatch.(*Router).getHandler()
/home/runner/work/flos/internal/dispatch/router.go:94 +0x6c
Previous write at 0x00c0001b4030 by goroutine 7:
github.com/baodev/flos/internal/dispatch.(*Router).Register()
/home/runner/work/flos/internal/dispatch/router.go:61 +0x84
Goroutine 18 (running) created at:
github.com/baodev/flos/internal/dispatch.(*Router).Start()
/home/runner/work/flos/internal/dispatch/router.go:112 +0x1e0
==================
File and line numbers, both goroutines, the moment of creation. It tells you exactly where to look.
The representative case
The most embarrassing one: a map[string]HandlerFunc being read by worker goroutines while a registration goroutine could still be writing to it. Classic. The map wasn't behind a mutex because I "registered everything at startup." Except one code path registered a handler lazily on first use.
type Router struct {
- handlers map[string]HandlerFunc
+ handlers map[string]HandlerFunc
+ mu sync.RWMutex
}
func (r *Router) Register(name string, fn HandlerFunc) {
+ r.mu.Lock()
+ defer r.mu.Unlock()
r.handlers[name] = fn
}
func (r *Router) getHandler(name string) HandlerFunc {
+ r.mu.RLock()
+ defer r.mu.RUnlock()
return r.handlers[name]
}
12 lines changed. Bug had been live since the initial commit in February.
Adding it to CI is one line
If you're on GitHub Actions and not already running this, add it to your test job:
- name: Test with race detector
run: go test -race -count=1 -timeout=120s ./...
Or if you run tests via a Makefile:
test-race:
go test -race -count=1 -timeout=120s ./...
The -count=1 disables the test result cache so every CI run actually executes. Without it, Go can return cached results even on -race, which defeats the point.
What the other 7 were
I won't detail all of them. Mostly they were the same pattern: shared state accessed from spawned goroutines, written once somewhere "safe" and read everywhere else, with no synchronization because the write "always finished first." The race detector disagreed with that assumption on 7 separate occasions.
Two of them were in test helpers, not production code. Still real races — test helpers spin goroutines too, and a flaky test that fails once every 40 runs is its own kind of tax.
The honest accounting
Three months of technical debt on a four-person equivalent codebase (it's mostly me and agents). Eight findings in 11 seconds of wall time. One of those findings was in the agent dispatch path that runs on every job — meaning every job that completed without incident was getting lucky with goroutine scheduling.
That's the uncomfortable part about race conditions: they don't fail loudly. They fail intermittently, or they corrupt state silently, or they don't fail at all on your machine because your CPU happens to schedule goroutines in a forgiving order.
The race detector doesn't care about your scheduler's mood.
Running it weekly now. Should have been in CI from day one — -race exists precisely because humans are bad at reasoning about concurrent memory access under load.
Top comments (0)