You shipped a Go service. Benchmarks look great. CPU usage is low. Average latency is comfortably within targets.
And yet every now and then your p99 explodes.
This is the part many engineers underestimate: fast systems can still be unpredictable systems. In Go, latency spikes are rarely caused by a single obvious bottleneck. They emerge from the interaction between the runtime, the OS, and your code under real-world load.
Let’s dig into the less obvious reasons your Go service spikes and what you can actually do about them.
The Illusion of “Fast Enough”
Go makes it easy to build services that are consistently good on average. Goroutines are cheap, the standard library is efficient, and deployment is simple.
But averages lie.
Latency-sensitive systems live and die by tail latency p95, p99, p999. These outliers are where user experience breaks down, SLAs fail, and debugging becomes painful.
If your service is “fast but spiky,” you’re likely dealing with one (or more) of the following.
1. Garbage Collection Isn’t Free (Even When It’s Good)
Go’s garbage collector is excellent, but it is not invisible.
What’s happening
Modern Go uses a concurrent, tri-color mark-and-sweep GC. Most of the work happens alongside your application, but there are still stop-the-world (STW) phases especially during:
- Stack scanning
- Mark termination
Even if these pauses are short (microseconds to milliseconds), they can stack up under load and show up as latency spikes.
Why it gets worse in production
- High allocation rates increase GC frequency
- Large heaps increase scan time
- Pointer-heavy data structures slow marking
What to look for
- Sudden spikes aligned with GC cycles
- Increased
GOGCpressure - High allocation profiles in
pprof
What actually helps
- Reduce allocations in hot paths
- Reuse objects (
sync.Pool, carefully) - Avoid unnecessary pointers
- Flatten data structures when possible
2. Goroutine Contention and Scheduler Behavior
Goroutines are cheap but not free, and definitely not magic.
What’s happening
Go’s scheduler multiplexes goroutines onto OS threads. Under load:
- Run queues grow
- Context switching increases
- Work stealing adds overhead
If too many goroutines compete for CPU or locks, latency spikes emerge not from raw compute, but from waiting.
Common traps
- Spawning unbounded goroutines per request
- Blocking operations inside goroutines
- Assuming “more concurrency = faster”
Subtle issue: cooperative preemption
Go relies partly on cooperative preemption. If a goroutine runs tight loops without safe points, it can delay scheduling fairness.
What to do
- Use worker pools for bounded concurrency
- Avoid long-running CPU loops without yielding
- Profile scheduler latency (
runtime/trace)
3. Lock Contention: The Silent Killer
Mutexes don’t show up in CPU profiles but they absolutely show up in latency.
What’s happening
Under contention:
- Goroutines block on locks
- Queueing delays increase
- Throughput may remain high, but latency explodes
Where it hides
- Global maps with mutex protection
- Shared caches
- Logging pipelines
- Metrics collectors
Why it’s tricky
You might not notice until traffic scales. Everything works fine until it suddenly doesn’t.
What works
- Reduce lock granularity
- Prefer sharded structures
- Use lock-free or atomic patterns where appropriate
- Measure with mutex profiling (
go test -mutexprofile)
4. Network and Syscall Variability
Your Go code might be fast. The network is not.
What’s happening
Every request eventually hits:
- TCP stack
- DNS resolution
- Kernel scheduling
- External services
Even tiny variations here can cascade into visible latency spikes.
Common culprits
- DNS lookups without caching
- Connection churn (lack of keep-alives)
- Slow downstream dependencies
- Kernel-level queueing
The hidden factor: tail amplification
If your service calls 5 downstream services, each with p99 latency of 50ms, your combined p99 is much worse.
What helps
- Use connection pooling aggressively
- Set timeouts everywhere (and mean it)
- Cache DNS where possible
- Budget latency across dependencies
5. GC + Scheduler + Syscalls: The Perfect Storm
The real problem is rarely one issue it’s interaction effects.
A typical spike might look like this:
- GC cycle starts under high allocation pressure
- Goroutines increase due to incoming traffic
- Lock contention rises in shared structures
- A few slow network calls block threads
- Scheduler struggles to keep up
Individually, each is manageable. Together, they create a spike that’s hard to reproduce and harder to debug.
6. Misleading Benchmarks
Your local benchmarks probably didn’t show any of this.
Why?
- No real network variability
- No production traffic patterns
- No contention
- No long-lived heap growth
Benchmarks measure ideal conditions. Production exposes emergent behavior.
7. Observability Gaps
You can’t fix what you can’t see.
Most teams track:
- Average latency
- CPU usage
- Memory usage
But miss:
- GC pause distribution
- Goroutine counts over time
- Scheduler delays
- Mutex contention
- Per-endpoint tail latency
Without these, spikes remain mysterious.
What Actually Works in Practice
If you care about latency consistency, not just speed:
1. Profile under realistic load
Use:
-
pprof(CPU, heap, allocs, mutex) -
runtime/tracefor scheduler insights
2. Track the right metrics
- p95/p99 latency (not averages)
- GC pause time
- Goroutine count
- Queue lengths
3. Design for bounded behavior
- Limit concurrency
- Avoid unbounded queues
- Apply backpressure
4. Reduce variability, not just cost
- Stable systems beat “fast on average” systems
- Predictability > peak performance
Final Thought
Go gives you the tools to build extremely fast systems. But it doesn’t guarantee consistent latency that part is on you.
If your service has latency spikes, don’t look for a single bug. Look for interactions under pressure.
Because in production, the question isn’t:
“Is my service fast?”
It’s:
“Is my service predictable when everything starts going wrong?”
Top comments (0)