DEV Community

Cover image for GO profiling using pprof
Davi Orlandi
Davi Orlandi

Posted on

GO profiling using pprof

Have you ever spent nights trying to understand why your application is using so many resources? Your CPU is on fire, memory keeps growing, latency goes up… and the code looks fine. Probably, you're guessing instead of verifying.

In this article, we'll see how to verify possible memory leaks, unused resources and bad optimized code using the Go built-in profiling tool pprof.

What is pprof?

pprof is Go builtin profiling tool that allows you to collect and analyze runtime data from your application, such as CPU usage, memory allocations, goroutines, and blocking operations.

In simple terms, pprof answers questions like:

  • Why is my application slow?
  • Where is CPU time being spent?
  • What is allocating so much memory?
  • Are goroutines leaking?

Spoiler: pprof does not magically optimize your code. It just tells you where you messed up.


How pprof works

pprof works in two steps, collect (where we can collect metrics from our applicaton) and analyze (where we analyze the collected metrics using the go tool pprof).

At runtime, Gosamples execution and records metrics. These samples are aggregated into profiles that can be visualized and explored using pprofcommands.

You might think this is a lot like traces, but under the hood, Go does statistical profiling, that means your program is periodically interrupted and the runtime records what was running at that exact moment. Overtime these snapshots form a accurate picture of resources use.

A simple way to enable proffiling in you app is using an http server, take an example:

main.go

This exposes profiling endpoints under /debug/pprof/, such as /debug/pprof/profile (CPU), /debug/pprof/heap (Memory), etc.

Collecting profiles

When you hit a pprof endpoint like /debug/pprof/profile, what you get back is not a report, it’s a binary blob. If you try to open it in the browser, you’ll download a file and see what looks like complete garbage.

At first, that's correct, this endpoint returns a serialized pprof profile (a protobuf, usually gzip-compressed) that contains raw sampled data collected by Go runtime, stack traces, counters, timestamps. Not charts. Not tables. Not answers. This binary mess is not meant for humans.

And that’s where pprof comes in.

pprof is the tool that takes this raw, unreadable data and decodes, aggregates, correlates and turns it in a human readable output.

Analyzing profiles

Well, let's start to analyze our profilling output. The first step is install the pprof cli using this command below:

go install github.com/google/pprof@latest
Enter fullscreen mode Exit fullscreen mode

You should see something like that:

pprof install

Now, we'll need the web server running and try to get a profiling with this command below (CPU Profile, 10 seconds):

go tool pprof http://localhost:8080/debug/pprof/profile?seconds=10
Enter fullscreen mode Exit fullscreen mode

If you made everything right and waited for 10 seconds, you should get interactive shell. This is where the raw profile data becomes actionable. Once inside pprof, you can execute many commands, such as top, list, web, etc. (check the options here)

The Big Bad Boy

Imagine we have an endpoint using a lot of CPU, the "Big Bad Boy", for study proposal, i'll use a /work endpoint.

main.go

With our app running, hit the /work endpoint from another terminal or using curl, trigger some CPU load:

curl http://localhost:8080/work
Enter fullscreen mode Exit fullscreen mode

You can repeat it a few times or use a load tool like hey or ab to generate more pressure.

Now, collect a 10-second CPU profile from the running server:

go tool pprof http://localhost:6060/debug/pprof/profile?seconds=10
Enter fullscreen mode Exit fullscreen mode

Inside the pprof shell, you can show top functions consuming CPU using top.

top

You wi'll se an output like:

(pprof) top 
Showing nodes accounting for 7470ms, 92.45% of 8080ms total
Dropped 2 nodes (cum <= 40.40ms)
Showing top 10 nodes out of 35
      flat  flat%   sum%        cum   cum%
    2030ms 25.12% 25.12%     2030ms 25.12%  math.archExp
    1570ms 19.43% 44.55%     1570ms 19.43%  math.IsInf (inline)
    1500ms 18.56% 63.12%     2830ms 35.02%  math.log
     560ms  6.93% 70.05%     5800ms 71.78%  math.pow
     430ms  5.32% 75.37%     1050ms 13.00%  math.sin
     330ms  4.08% 79.46%      330ms  4.08%  runtime.pthread_cond_signal
     320ms  3.96% 83.42%     1560ms 19.31%  math.frexp
     290ms  3.59% 87.00%      290ms  3.59%  math.Float64frombits (inline)
     220ms  2.72% 89.73%      220ms  2.72%  math.IsNaN (inline)
     220ms  2.72% 92.45%      230ms  2.85%  math.normalize (inline)
Enter fullscreen mode Exit fullscreen mode

We can se a lot of math packages operations right now, but we don't know where those operations are used. So, let's run top command again with -cum (cumulative) flag.

(pprof) top -cum
Showing nodes accounting for 2170ms, 26.86% of 8080ms total
Dropped 2 nodes (cum <= 40.40ms)
Showing top 10 nodes out of 35
      flat  flat%   sum%        cum   cum%
     110ms  1.36%  1.36%     7750ms 95.92%  main.heavyComputation
         0     0%  1.36%     7750ms 95.92%  main.main.func1
         0     0%  1.36%     7750ms 95.92%  net/http.(*ServeMux).ServeHTTP
         0     0%  1.36%     7750ms 95.92%  net/http.(*conn).serve
         0     0%  1.36%     7750ms 95.92%  net/http.HandlerFunc.ServeHTTP
         0     0%  1.36%     7750ms 95.92%  net/http.serverHandler.ServeHTTP
         0     0%  8.29%     2830ms 35.02%  math.Log (inline)
Enter fullscreen mode Exit fullscreen mode

Spotted! The main.heavyComputation consuming 95.92% of the total CPU, well, that is our “Big Bad Boy.”


You can do the same for memory, block, mutex, and goroutine profiles

  • Memory (heap) – see allocations and leaks:
go tool pprof http://localhost:6060/debug/pprof/heap
Enter fullscreen mode Exit fullscreen mode
  • Goroutines – detect leaks or deadlocks:
go tool pprof http://localhost:6060/debug/pprof/goroutine
Enter fullscreen mode Exit fullscreen mode
  • Block – see time spent waiting on channels, mutexes, etc.:
go tool pprof http://localhost:6060/debug/pprof/block
Enter fullscreen mode Exit fullscreen mode
  • Mutex – see contention hotspots:
go tool pprof http://localhost:6060/debug/pprof/mutex
Enter fullscreen mode Exit fullscreen mode

All of these can be analyzed in the same way (top, list, web, svg). The web and svg commands produce graphical visualizations (requires Graphviz), making it much easier to spot hotspots.


Visualization (Graphviz required)

Graphical visualizations make it much easier to spot CPU or memory hotspots quickly, for that, we need install Graphviz and open source graph visualization software.

  • web: opens a call graph in your browser
  • svg: outputs an SVG call graph
  • pdf: outputs a PDF graph

Don’t spray and pray

pprof has multiple profile types because performance problems love to hide in different places. Each type answers a different question:

  • CPU – where time is spent
  • Heap – memory usage and leaks
  • Goroutines – leaks or deadlocks
  • Block – wait times
  • Mutex – contention

Always choose the profile that matches the problem.

ADDITIONAL: Profiling in Kubernetes

If your app runs in a Kubernetes cluster, you can port-forward the pprof port from your pod:

kubectl port-forward deploy/my-app 6060:6060
Enter fullscreen mode Exit fullscreen mode

Access pprof locally:

go tool pprof http://localhost:6060/debug/pprof/profile?seconds=10
Enter fullscreen mode Exit fullscreen mode

Analyze interactively as usual (top, list).

Useful Links & References

Go pprof docs: https://pkg.go.dev/net/http/pprof
Runtime profiling: https://pkg.go.dev/runtime/pprof
go tool pprof guide: https://go.dev/doc/diagnostics#profiling
Official Go blog post about pprof: https://go.dev/blog/pprof


So, Thanks!

If this article saved you even one late night of guessing and blaming the garbage collector, it already did its job 😄. Thanks for reading!

See you soon ;)

Top comments (0)