DEV Community

Cover image for Go Memory Profiling & Performance Debugging — Real-World Guide to pprof
Serif COLAKEL
Serif COLAKEL

Posted on

Go Memory Profiling & Performance Debugging — Real-World Guide to pprof

Memory issues in Go aren’t always dramatic.
Sometimes they’re silent: your service boots at 200MB, runs fine, then slowly grows to 2GB+ over the day.

No panics.
No spikes in CPU.
Just… memory creep.

If you’ve ever faced this, welcome — this guide is the practical, real-world walkthrough I wish I had years ago.

In this article, we’ll explore:

  • how to correctly use pprof (CPU, heap, block, mutex)
  • how to read flamegraphs
  • how to diagnose true memory leaks
  • how to analyze production memory spikes
  • how concurrency patterns impact memory
  • real-world case studies

Let’s dive in. 🚀


🌡️ 1. pprof Basics (CPU, Memory, Block, Mutex)

Go ships with profiling tools that many languages envy. With almost no setup, you can inspect:

  • CPU profiling → where time is spent
  • Memory (heap) → what allocates / what retains
  • Block profiling → goroutines blocked on channels/locks
  • Mutex profiling → lock contention

Enabling it is simple:

import _ "net/http/pprof"

go func() {
    log.Println(http.ListenAndServe("0.0.0.0:6060", nil))
}()
Enter fullscreen mode Exit fullscreen mode

Then:

curl http://localhost:6060/debug/pprof/heap > heap.out
curl http://localhost:6060/debug/pprof/profile?seconds=30 > cpu.out
Enter fullscreen mode Exit fullscreen mode

Or open the UI:

go tool pprof -http=:9999 heap.out
Enter fullscreen mode Exit fullscreen mode

🧠 2. alloc_space vs inuse_space (Critical!)

When reading heap profiles, you’ll see:

alloc_space

Total memory allocated over time (cumulative).
→ Great for spotting allocation-heavy functions.

inuse_space

Memory currently retained and in use.
→ This is where memory leaks appear.

Common mistake:
People confuse “allocations” with “leaks”.

A leak shows up in inuse_space, not alloc_space.


🏥 3. Case Study: “Why is my Go service using 2GB RAM?”

A very real production issue:

  • service normally uses ~200MB
  • memory grows slowly over hours
  • GC runs frequently
  • CPU normal
  • RAM never drops

Step 1 — Capture the heap

curl http://localhost:6060/debug/pprof/heap > heap.out
Enter fullscreen mode Exit fullscreen mode

Step 2 — Inspect

go tool pprof -top heap.out
Enter fullscreen mode Exit fullscreen mode

Look for:

  • large inuse_space
  • suspicious data structures
  • packages that shouldn’t hold memory

Step 3 — Visualize (the real magic)

go tool pprof -http=:9999 heap.out
Enter fullscreen mode Exit fullscreen mode

Typical root causes:

  • unbounded channels
  • slices that grow but never shrink
  • caches with no eviction
  • goroutines stuck holding references
  • large buffers reused incorrectly

Every real memory issue I’ve diagnosed involved the flamegraph.


🔥 4. Reading Flamegraphs (Fast, Practical Guide)

When you open the pprof UI, the flamegraph is your best friend.

How to read it:

  • wide boxes = lots of memory
  • boxes at the bottom = root cause
  • bright colors = heavy allocators
  • tall stacks = long call chains

Look for:

  • a wide block at the very bottom
  • repeated patterns
  • unexpected external packages
  • functions returning large objects

If something looks “too wide,” it probably is.


🟢 5. Safe Production Profiling (Do’s & Don’ts)

Yes — you can run pprof in production.

✔ Safe:

  • heap profiles
  • short CPU profiles (5–15s)
  • mutex/block profiles
  • profiling internal-only endpoints

✘ Dangerous:

  • long CPU profiles on high-traffic systems
  • exposing /debug/pprof publicly
  • profiling during incident response without sampling

Recommended setup:

import (
    _ "net/http/pprof"
)

func init() {
    go func() {
        http.ListenAndServe("127.0.0.1:6060", nil)
    }()
}
Enter fullscreen mode Exit fullscreen mode

Keep it on localhost or behind internal ingress only.


🧪 6. Production Memory Profiling Checklist

Before concluding “it’s a leak,” verify:

✔ Compare multiple heap snapshots
✔ Look at inuse_space, not alloc_space
✔ Check GC frequency (GODEBUG=gctrace=1)
✔ Capture goroutine dump (debug/pprof/goroutine)
✔ Inspect channel sizes
✔ Confirm no unbounded caches
✔ Look for large slices/maps retaining data

Memory issues rarely come from one single spot — they’re usually behavioral patterns.


⚡ 7. Memory Spikes During Traffic Surges

Traffic surges often cause:

  • temporary buffer spikes
  • batching behavior
  • backpressure in channels
  • worker pools expanding
  • slow consumers holding memory longer
  • GC pauses misaligned with load

To debug:

curl http://localhost:6060/debug/pprof/heap > spike.out
curl http://localhost:6060/debug/pprof/heap > normal.out
Enter fullscreen mode Exit fullscreen mode

Then compare:

go tool pprof -diff_base normal.out spike.out
Enter fullscreen mode Exit fullscreen mode

This shows what changed during the spike.


🔄 8. How Concurrency Affects Memory

Channels

Unbounded sends → memory grows.
Slow consumers → backlog accumulates.

Buffers

Large byte slices retained in memory.
Temporary buffers escaping to the heap accidentally.

sync.Pool

Useful, but not a fix-all:

  • no guarantee memory is freed immediately
  • may keep pools warm and increase RSS
  • bad reuse of objects → stale references

Workers

Worker pools can hide memory growth when load increases.

If concurrency increases under load → memory increases too.


🎯 Final Thoughts

Memory profiling is one of the most powerful debugging skills in Go — and one of the most underrated.

If you master:

  • heap profiles
  • flamegraphs
  • GC tracing
  • concurrency behavior

…you’ll be able to diagnose 90% of real-world performance issues in modern Go microservices.

Happy coding! 🚀

Top comments (0)