DEV Community

Cover image for 🧠 Detecting and Preventing Goroutine Leaks in Production (Leak Detection in Go)
Serif COLAKEL
Serif COLAKEL

Posted on

🧠 Detecting and Preventing Goroutine Leaks in Production (Leak Detection in Go)

Goroutines are one of Go’s biggest superpowers — lightweight, fast, and easy to spin up. But with great power comes great responsibility.
Left unchecked, goroutines can silently leak, grow in number over time, consume memory, and eventually bring down your service.

Goroutine leaks are sneaky. They often don’t break your code immediately…
but they slowly eat your system alive.

In this guide, we’ll explore:

  • What causes goroutine leaks
  • Real-world patterns that accidentally leak
  • How to debug them (pprof, trace, runtime APIs)
  • How to prevent leaks using context cancellation and proper channel patterns
  • Production-ready best practices

Let’s dive in. 🚀


❗ What Is a Goroutine Leak?

A goroutine leak happens when a goroutine never exits, usually because it is:

  • blocked on a channel
  • waiting on a select case that never fires
  • stuck on I/O
  • waiting for a context that is never cancelled
  • part of a consumer/producer pipeline with no exit condition

Over time, the number of goroutines grows:

fmt.Println(runtime.NumGoroutine())
Enter fullscreen mode Exit fullscreen mode

If this spikes over hours/days, you probably have a leak.


🐛 A Classic Example of a Goroutine Leak

Here’s a typical bug many developers run into:

func worker(jobs <-chan int) {
    for {
        job := <-jobs
        fmt.Println("processing", job)
    }
}

func main() {
    jobs := make(chan int)

    go worker(jobs)

    // Never sends jobs
    time.Sleep(5 * time.Second)
}
Enter fullscreen mode Exit fullscreen mode

What happens?

  • worker() blocks forever on <-jobs
  • No jobs ever arrive
  • The goroutine never exits
  • Boom — leak

Now imagine this in a loop spawning workers every request.
That’s how production incidents happen.


⚠️ Real-World Leak Scenario: Cancelling Requests Without Cancelling Goroutines

This is one of the top 3 causes of leaks in production.

func fetch(ctx context.Context) error {
    ch := make(chan string)

    go func() {
        time.Sleep(5 * time.Second)
        ch <- "done"
    }()

    select {
    case <-ctx.Done():
        return ctx.Err()
    case v := <-ch:
        fmt.Println("response:", v)
        return nil
    }
}
Enter fullscreen mode Exit fullscreen mode

Problem:
If the context is cancelled, the goroutine is not stopped.
It keeps sleeping, then tries to send into ch where nobody is listening → goroutine leak.


🧯 Fixing the Leak with Cancellation Propagation

Use context to tell goroutines when to stop:

go func() {
    defer close(ch)

    select {
    case <-time.After(5 * time.Second):
        ch <- "done"
    case <-ctx.Done():
        return
    }
}()
Enter fullscreen mode Exit fullscreen mode

Now the goroutine exits gracefully when the context is cancelled.


🕳️ Hidden Leak: Unbounded Goroutine Spawning

Sneaky production issue:

http.HandleFunc("/search", func(w http.ResponseWriter, r *http.Request) {
    go expensiveOperation()
})
Enter fullscreen mode Exit fullscreen mode

Under load:
100 req/sec → 100 goroutines/sec → 6,000 goroutines/min → goodbye memory.

Fix: use a worker pool (bounded concurrency).


🧵 Leaks in Channels and Pipelines

Pipeline pattern gone wrong:

func generator() <-chan int {
    ch := make(chan int)

    go func() {
        for i := 0; i < 10; i++ {
            ch <- i
        }
    }()

    return ch
}
Enter fullscreen mode Exit fullscreen mode

Problem: channel never closes → consumer blocks forever.

Fix:

go func() {
    defer close(ch)
    for i := 0; i < 10; i++ {
        ch <- i
    }
}()
Enter fullscreen mode Exit fullscreen mode

🔎 How to Detect Goroutine Leaks in Production

1. runtime.NumGoroutine()

Track this metric in Prometheus:

go_goroutines
Enter fullscreen mode Exit fullscreen mode

If it constantly grows → 🔥 leak.


2. pprof (Goroutine Profiles)

Enable pprof:

import _ "net/http/pprof"

go http.ListenAndServe(":6060", nil)
Enter fullscreen mode Exit fullscreen mode

Then inspect goroutines:

go tool pprof http://localhost:6060/debug/pprof/goroutine
Enter fullscreen mode Exit fullscreen mode

Look for:

  • goroutines stuck on <-chan
  • “waiting for lock”
  • “sleeping”
  • same stack repeated thousands of times

3. go trace

Trace gives a timeline of goroutine lifecycle:

go test -trace trace.out
go tool trace trace.out
Enter fullscreen mode Exit fullscreen mode

You'll see goroutines that:

  • never complete
  • block on channels
  • block on network I/O

4. Heap Growth

Leaked goroutines capture memory:

  • closure variables
  • buffers
  • context
  • network state

If heap grows in parallel with goroutines → confirmed leak.


🛠 Best Practices to Prevent Goroutine Leaks

1. Always use context with goroutines

go func(ctx context.Context) {
    select {
    case <-ctx.Done():
        return
    case v := <-ch:
        _ = v
    }
}(ctx)
Enter fullscreen mode Exit fullscreen mode

2. Close channels when done

Producer closes, not consumer:

close(results)
Enter fullscreen mode Exit fullscreen mode

3. Avoid unbounded goroutine creation

Use worker pools:

sem := make(chan struct{}, 10) // max 10 concurrent tasks

sem <- struct{}{}
go func() {
    defer func() { <-sem }()
    doWork()
}()
Enter fullscreen mode Exit fullscreen mode

4. Use select with defaults to avoid deadlocks

select {
case jobs <- job:
default:
    log.Println("job queue full")
}
Enter fullscreen mode Exit fullscreen mode

5. Timeouts for every external call

Never trust remote systems:

ctx, cancel := context.WithTimeout(context.Background(), 3*time.Second)
defer cancel()
Enter fullscreen mode Exit fullscreen mode

6. Monitor goroutine count aggressively

Set alerts:

if go_goroutines > 2000 (and keeps increasing)
    alert: "Possible goroutine leak"
Enter fullscreen mode Exit fullscreen mode

📌 Checklist: Before shipping Go code to production

  • ✅ Do all goroutines have a way to exit?
  • ✅ Are contexts cancelled?
  • ✅ Are channels closed when done?
  • ✅ Is concurrency bounded?
  • ✅ Are there blocking operations inside goroutines?
  • ✅ Do we monitor goroutine count?

🎯 Final Thoughts

Goroutine leaks are one of the most common hidden problems in Go microservices — especially under real production load.

By understanding how leaks happen and using proper patterns (context cancellation, closing channels, bounded concurrency), you can:

  • prevent memory bloat
  • avoid production outages
  • keep your services reliable and fast
  • confidently observe and debug concurrency issues

Happy debugging & happy coding! 🔥🐹

Top comments (0)