What is pprof
pprof is Go's built-in profiling tool that helps you understand where your program spends CPU time, how it uses memory, where goroutines block, and more.
It's part of the standard library (runtime/pprof and net/http/pprof) and can be used both locally and in production via HTTP endpoints.
How it works under the hood
The Go runtime includes internal hooks and counters that collect low-level statistics:
- CPU - samples stack traces at regular intervals (default 100 Hz).
- Memory (heap) — records allocation data from the garbage collector.
- Block / Mutex — tracks delays caused by synchronization (e.g., sync.Mutex, channel, select).
- Goroutine — captures snapshots of all running goroutines and their stack traces.
pprof gathers this data and can:
- Write it to files (
pprof.WriteHeapProfile,pprof.StartCPUProfile); - Expose it via HTTP endpoints (
net/http/pprof); - Export it in a format compatible with
go tool -pprof, Speedscope, or Parca.
Main profile types
| Type | Focus | When to use | Note | How to get |
|---|---|---|---|---|
| CPU | Execution time | High CPU load, slow processing | Where CPU time is spent |
pprof.StartCPUProfile(file) or /debug/pprof/profile
|
| Heap | Memory usage | Memory leaks, OOM, high RAM | Memory allocations (live and temporary) |
pprof.WriteHeapProfile(file) or /debug/pprof/heap
|
| Goroutine | Concurrency snapshot | Deadlocks, leaks, hanging requests | Stack traces of all goroutines | /debug/pprof/goroutine |
| Block | Waiting time | Latency, thread stalls | Where goroutines are blocked | /debug/pprof/block |
| Mutex | Lock contention | Poor scalability | Where mutexes are most frequently held | /debug/pprof/mutex |
| Allocs | Allocation frequency | GC pressure, short-lived allocations | All memory allocations, including freed ones | /debug/pprof/allocs |
CPU Profile - where processing time goes
The Go runtime samples stack traces about 100 times per second during execution. This tells you which functions consume the most CPU time - i.e. where the CPU is actually being used.
When to use
- The app is slow or CPU-bound;
- You want to identify hot paths;
- You're optimizing loops, parsing, serialization, or number crunching. #### Typical findings
- Slow
json.Marshalin loops; - Overuse of
fmt.Sprintf; - Too many small allocations per iteration.
Command
go tool pprof http://localhost:6060/debug/pprof/profile?seconds=30
How to read
- top shows the most expensive functions;
- web (or Speedscope) visualizes a flamegraph (width = time).
Heap Profile - memory usage
Shows how much memory is allocated and where allocations happen. Data is collected from the garbage collector (GC).
When to use
- Memory usage keeps growing;
- There's a memory leak;
- You need to know who allocates too often or too much. #### Typical findings
- Temporary objects inside loops;
- Unbounded caches or slices;
- Unreleased references keeping memory alive.
Command
go tool pprof http://localhost:6060/debug/pprof/heap
How to read
- AllocObjects / AllocSpace → total allocations;
- InUseObjects / InUseSpace → currently live objects.
Use flags like --alloc_space or --inuse_space to switch views.
Goroutine Profile - snapshot of all goroutines
Captures stack traces of all goroutines at a single point in time.
When to use
- The app freezes or stops responding;
- Goroutine count keeps increasing;
- You suspect a deadlock or goroutine leak. #### Typical findings
- Goroutines stuck on
<-chanwith no receiver; -
WaitGroupnever reaches zero; - Infinite
select {}without adefault.
Command
curl http://localhost:6060/debug/pprof/goroutine?debug=2
How to read
You'll see text stack traces. Look for "sleep", "chan receive", "mutex", etc.
Usually it's easy to spot where execution is stuck.
Block Profile - where goroutines are waiting
Tracks how long goroutines are blocked, waiting on synchronization primitives (channels, mutexes, conditions).
When to use
- The app hangs under low load;
- High latency in simple operations;
- You want to find wait points that slow down performance. #### Typical findings
- Overloaded channels;
- Shared data structures causing contention;
- Backpressure in worker queues.
Enable in code
runtime.SetBlockProfileRate(1)
Command
go tool pprof http://localhost:6060/debug/pprof/block
Mutex Profile - lock contention
Records how long goroutines hold a mutex and how long others wait for it.
When to use
- CPU usage is low but app is slow;
- Throughput doesn't scale with concurrency;
- You suspect shared locks or global bottlenecks. #### Typical findings
- A global
sync.Mutexin a hot path; - Shared maps without sharding;
- Logging or metrics inside critical sections.
Enable in code
runtime.SetMutexProfileFraction(1)
Command
go tool pprof http://localhost:6060/debug/pprof/mutex
Allocs Profile - all allocations (including freed ones)
Shows all memory allocations, not just the ones still in use.
Useful to understand allocation rate and GC pressure.
When to use
- The app allocates too frequently (high GC load);
- You're optimizing short-lived, high-throughput operations. #### Typical findings
- Repeated string concatenations (
+,fmt.Sprintf); - Allocating new
[]byteon each request; - Inefficient append or map usage.
Command
go tool pprof http://localhost:6060/debug/pprof/allocs
Warning! Block and Mutex profiling should not be enabled permanently in production.
Use sampling values to reduce overhead, for example:
runtime.SetBlockProfileRate(10000)runtime.SetMutexProfileFraction(100)
Using pprof via HTTP
-
Enable profiling on a running service:
import ( "net/http" _ "net/http/pprof" ) func main() { go func() { http.ListenAndServe("localhost:6060", nil) }() // your app logic } Open this following link in your browser:
http://localhost:6060/debug/pprof/-
Run a go command to collect profile information
go tool pprof http://localhost:6060/debug/pprof/profile?seconds=30
Using pprof directly in code
-
Recording a CPU profile to a file
f, _ := os.Create("cpu.pprof") pprof.StartCPUProfile(f) defer pprof.StopCPUProfile() // Run your workload workload() -
Analyze it with:
go tool pprof cpu.pprof (pprof) top (pprof) web
Best practices for production
- Restrict access to
/debug/pprof/(e.g., Basic Auth, internal IPs, env flag). - Don't run CPU profiling all the time - it adds ~5–10% overhead.
- CPU profiles should be at least 10–30 seconds
- Heap profiles are safe to collect in production.
- For visualization, use:
-
pprof -http— quick online inspection; - Speedscope — fast and intuitive;
- Parca — continuous profiling.
-
Reading profiling results
What a pprof graph represents
pprof generates a call graph, where:
- Node (box) - a function;
- Edge (arrow) - a call from one function to another;
- Node weight - how much CPU time or memory that function consumed;
- Edge weight - how much cost was passed to the callee functions.
In short, the graph shows who calls whom and where time or memory is being spent.
Red flags to look for:
- Wide bottom block → hottest function
-
runtime.mallocgcdominating → too many allocations -
sync.(*Mutex).Lockhigh → contention - Many narrow repeated blocks → allocations inside a loop
Common visualization modes
top
Shows a summary table:
(pprof) top
Showing nodes accounting for 85.73%, 34.29s of 40.00s total
flat flat% sum% cum cum%
10.23s 25.6% 25.6% 18.92s 47.3% main.work
8.69s 21.7% 47.3% 10.35s 25.9% processData
Columns meaning:
- flat - time spent inside the function itself (excluding callees) cum (cumulative) - total time including callees
- flat% / cum% - relative to total runtime ##### Highlights:
- Large flat time - optimize that specific function.
- Large cum time but small flat - the problem is in a callee.
list funcName
Shows annotated source code lines:
(pprof) list compute
Total: 40s
ROUTINE ======================== compute in main.go
10.00s 15.00s (flat, cum) 37.50% of Total
Highlights:
- You can see which specific lines consume CPU or allocate memory — ideal for micro-optimizations.
web (or go tool pprof -http=:6060)
Opens an interactive call graph in your browser.
Color & size meaning:
- Red - functions consuming the most resources
- Yellow - medium cost
- Green - minor impact
- Box width - time or memory weight
- Arrow - function call relationship ##### Highlights:
- The wider and redder the box, the hotter the function.
Flamegraph
A flamegraph is the most intuitive format.
Example:
main
└── handleRequest
├── parseJSON
└── processData
├── validate
└── saveToDB
Each rectangle = a function:
- X-axis = total time (width = cost)
- Y-axis = call stack depth
- Bottom → top = call chain (from main to leaf functions) ##### Highlights:
- If
parseJSONis wide - JSON encoding is CPU-heavy. - If
saveToDBdominates - DB operations are the bottleneck. ##### Example interpretation For example, your flamegraph shows:
main → handle → json.Marshal → reflect.Value.Interface
and reflect.Value.Interface takes 40% of CPU time.
That's a clear indicator of slow reflection-based serialization - replace with faster encoder like jsoniter or easyjson.
Practical reading tips
- Start with the widest blocks at the bottom — they consume the most time or memory.
- Ignore runtime internals (
runtime.*,syscall.*) unless something abnormal shows up. - Look for repeating narrow peaks — they often mean inefficient work inside loops.
- If CPU looks fine but app hangs, check block or mutex profiles — likely synchronization issues, not CPU load.
-
Compare before/after
go tool pprof -diff_base heap1.pprof heap2.pprof
Heap vs Allocs
| Profile | Shows | Common misunderstanding |
|---|---|---|
| heap (inuse) | live objects currently in memory | "memory leak" — but may just be a cache or buffer |
| allocs | all allocations (even freed) | engineers think "growth = leak" — it's not |
Visualization tools comparison
| Tool | Format | Best for |
|---|---|---|
| CLI (top, list) | Text | Quick inspection, remote servers |
| Web UI (pprof -http) | Interactive graph | Exploring call hierarchy |
| Speedscope | Visual | Immediate hotspot identification |
| Parca | Continuous profiling | Real-time production monitoring |
Conclusion
Profiling is one of the most valuable tools for diagnosing performance issues in Go applications, and pprof provides everything you need - from understanding CPU hotspots to uncovering memory leaks, goroutine leaks, synchronization bottlenecks, and inefficient allocation patterns.
The key to using pprof effectively is knowing which profile to capture, how to interpret what you see, and how to compare snapshots over time. CPU profiles reveal hot paths, heap profiles uncover leaks or excessive allocations, goroutine dumps expose deadlocks or runaway concurrency, while block and mutex profiles highlight contention that's invisible to standard metrics.
Most importantly:
- Always start from the widest blocks in flamegraphs.
- Use diffing to compare “before/after” optimizations.
- Enable advanced profiles (block/mutex) only when needed.
- Treat pprof as part of your standard debugging workflow - not as a last resort.
Mastering pprof turns performance debugging from guesswork into a repeatable, data-driven process. Once your team gets comfortable reading profiles, performance problems that used to take days can be solved in minutes.
Measure, don't guess - profile first.
Top comments (0)