Chiman Jain

Posted on Apr 24

Why Your Go Service Has Latency Spikes (Even If It’s “Fast”)

#go #productivity #performance #architecture

You shipped a Go service. Benchmarks look great. CPU usage is low. Average latency is comfortably within targets.

And yet every now and then your p99 explodes.

This is the part many engineers underestimate: fast systems can still be unpredictable systems. In Go, latency spikes are rarely caused by a single obvious bottleneck. They emerge from the interaction between the runtime, the OS, and your code under real-world load.

Let’s dig into the less obvious reasons your Go service spikes and what you can actually do about them.

The Illusion of “Fast Enough”

Go makes it easy to build services that are consistently good on average. Goroutines are cheap, the standard library is efficient, and deployment is simple.

But averages lie.

Latency-sensitive systems live and die by tail latency p95, p99, p999. These outliers are where user experience breaks down, SLAs fail, and debugging becomes painful.

If your service is “fast but spiky,” you’re likely dealing with one (or more) of the following.

1. Garbage Collection Isn’t Free (Even When It’s Good)

Go’s garbage collector is excellent, but it is not invisible.

What’s happening

Modern Go uses a concurrent, tri-color mark-and-sweep GC. Most of the work happens alongside your application, but there are still stop-the-world (STW) phases especially during:

Stack scanning
Mark termination

Even if these pauses are short (microseconds to milliseconds), they can stack up under load and show up as latency spikes.

Why it gets worse in production

High allocation rates increase GC frequency
Large heaps increase scan time
Pointer-heavy data structures slow marking

What to look for

Sudden spikes aligned with GC cycles
Increased GOGC pressure
High allocation profiles in pprof

What actually helps

Reduce allocations in hot paths
Reuse objects (sync.Pool, carefully)
Avoid unnecessary pointers
Flatten data structures when possible

2. Goroutine Contention and Scheduler Behavior

Goroutines are cheap but not free, and definitely not magic.

What’s happening

Go’s scheduler multiplexes goroutines onto OS threads. Under load:

Run queues grow
Context switching increases
Work stealing adds overhead

If too many goroutines compete for CPU or locks, latency spikes emerge not from raw compute, but from waiting.

Common traps

Spawning unbounded goroutines per request
Blocking operations inside goroutines
Assuming “more concurrency = faster”

Subtle issue: cooperative preemption

Go relies partly on cooperative preemption. If a goroutine runs tight loops without safe points, it can delay scheduling fairness.

What to do

Use worker pools for bounded concurrency
Avoid long-running CPU loops without yielding
Profile scheduler latency (runtime/trace)

3. Lock Contention: The Silent Killer

Mutexes don’t show up in CPU profiles but they absolutely show up in latency.

What’s happening

Under contention:

Goroutines block on locks
Queueing delays increase
Throughput may remain high, but latency explodes

Where it hides

Global maps with mutex protection
Shared caches
Logging pipelines
Metrics collectors

Why it’s tricky

You might not notice until traffic scales. Everything works fine until it suddenly doesn’t.

What works

Reduce lock granularity
Prefer sharded structures
Use lock-free or atomic patterns where appropriate
Measure with mutex profiling (go test -mutexprofile)

4. Network and Syscall Variability

Your Go code might be fast. The network is not.

What’s happening

Every request eventually hits:

TCP stack
DNS resolution
Kernel scheduling
External services

Even tiny variations here can cascade into visible latency spikes.

Common culprits

DNS lookups without caching
Connection churn (lack of keep-alives)
Slow downstream dependencies
Kernel-level queueing

The hidden factor: tail amplification

If your service calls 5 downstream services, each with p99 latency of 50ms, your combined p99 is much worse.

What helps

Use connection pooling aggressively
Set timeouts everywhere (and mean it)
Cache DNS where possible
Budget latency across dependencies

5. GC + Scheduler + Syscalls: The Perfect Storm

The real problem is rarely one issue it’s interaction effects.

A typical spike might look like this:

GC cycle starts under high allocation pressure
Goroutines increase due to incoming traffic
Lock contention rises in shared structures
A few slow network calls block threads
Scheduler struggles to keep up

Individually, each is manageable. Together, they create a spike that’s hard to reproduce and harder to debug.

6. Misleading Benchmarks

Your local benchmarks probably didn’t show any of this.

Why?

No real network variability
No production traffic patterns
No contention
No long-lived heap growth

Benchmarks measure ideal conditions. Production exposes emergent behavior.

7. Observability Gaps

You can’t fix what you can’t see.

Most teams track:

Average latency
CPU usage
Memory usage

But miss:

GC pause distribution
Goroutine counts over time
Scheduler delays
Mutex contention
Per-endpoint tail latency

Without these, spikes remain mysterious.

What Actually Works in Practice

If you care about latency consistency, not just speed:

1. Profile under realistic load

Use:

pprof (CPU, heap, allocs, mutex)
runtime/trace for scheduler insights

2. Track the right metrics

p95/p99 latency (not averages)
GC pause time
Goroutine count
Queue lengths

3. Design for bounded behavior

Limit concurrency
Avoid unbounded queues
Apply backpressure

4. Reduce variability, not just cost

Stable systems beat “fast on average” systems
Predictability > peak performance

Final Thought

Go gives you the tools to build extremely fast systems. But it doesn’t guarantee consistent latency that part is on you.

If your service has latency spikes, don’t look for a single bug. Look for interactions under pressure.

Because in production, the question isn’t:

“Is my service fast?”

It’s:

“Is my service predictable when everything starts going wrong?”

Top comments (1)

SatanDev • Apr 24

Me gusto mucho, pero me surgió una duda, recomiendas usar el heap para mejorar la estabilidad. ¿Pero el heap no es más lento que el stack?