Serif COLAKEL

Posted on Nov 16

🧠 Detecting and Preventing Goroutine Leaks in Production (Leak Detection in Go)

#go #productivity #software #backend

Goroutines are one of Go’s biggest superpowers — lightweight, fast, and easy to spin up. But with great power comes great responsibility.
Left unchecked, goroutines can silently leak, grow in number over time, consume memory, and eventually bring down your service.

Goroutine leaks are sneaky. They often don’t break your code immediately…
but they slowly eat your system alive.

In this guide, we’ll explore:

What causes goroutine leaks
Real-world patterns that accidentally leak
How to debug them (pprof, trace, runtime APIs)
How to prevent leaks using context cancellation and proper channel patterns
Production-ready best practices

Let’s dive in. 🚀

❗ What Is a Goroutine Leak?

A goroutine leak happens when a goroutine never exits, usually because it is:

blocked on a channel
waiting on a select case that never fires
stuck on I/O
waiting for a context that is never cancelled
part of a consumer/producer pipeline with no exit condition

Over time, the number of goroutines grows:

fmt.Println(runtime.NumGoroutine())

If this spikes over hours/days, you probably have a leak.

🐛 A Classic Example of a Goroutine Leak

Here’s a typical bug many developers run into:

func worker(jobs <-chan int) {
    for {
        job := <-jobs
        fmt.Println("processing", job)
    }
}

func main() {
    jobs := make(chan int)

    go worker(jobs)

    // Never sends jobs
    time.Sleep(5 * time.Second)
}

What happens?

worker() blocks forever on <-jobs
No jobs ever arrive
The goroutine never exits
Boom — leak

Now imagine this in a loop spawning workers every request.
That’s how production incidents happen.

⚠️ Real-World Leak Scenario: Cancelling Requests Without Cancelling Goroutines

This is one of the top 3 causes of leaks in production.

func fetch(ctx context.Context) error {
    ch := make(chan string)

    go func() {
        time.Sleep(5 * time.Second)
        ch <- "done"
    }()

    select {
    case <-ctx.Done():
        return ctx.Err()
    case v := <-ch:
        fmt.Println("response:", v)
        return nil
    }
}

Problem:
If the context is cancelled, the goroutine is not stopped.
It keeps sleeping, then tries to send into ch where nobody is listening → goroutine leak.

🧯 Fixing the Leak with Cancellation Propagation

Use context to tell goroutines when to stop:

go func() {
    defer close(ch)

    select {
    case <-time.After(5 * time.Second):
        ch <- "done"
    case <-ctx.Done():
        return
    }
}()

Now the goroutine exits gracefully when the context is cancelled.

🕳️ Hidden Leak: Unbounded Goroutine Spawning

Sneaky production issue:

http.HandleFunc("/search", func(w http.ResponseWriter, r *http.Request) {
    go expensiveOperation()
})

Under load:
100 req/sec → 100 goroutines/sec → 6,000 goroutines/min → goodbye memory.

Fix: use a worker pool (bounded concurrency).

🧵 Leaks in Channels and Pipelines

Pipeline pattern gone wrong:

func generator() <-chan int {
    ch := make(chan int)

    go func() {
        for i := 0; i < 10; i++ {
            ch <- i
        }
    }()

    return ch
}

Problem: channel never closes → consumer blocks forever.

Fix:

go func() {
    defer close(ch)
    for i := 0; i < 10; i++ {
        ch <- i
    }
}()

🔎 How to Detect Goroutine Leaks in Production

1. `runtime.NumGoroutine()`

Track this metric in Prometheus:

go_goroutines

If it constantly grows → 🔥 leak.

2. pprof (Goroutine Profiles)

Enable pprof:

import _ "net/http/pprof"

go http.ListenAndServe(":6060", nil)

Then inspect goroutines:

go tool pprof http://localhost:6060/debug/pprof/goroutine

Look for:

goroutines stuck on <-chan
“waiting for lock”
“sleeping”
same stack repeated thousands of times

3. `go trace`

Trace gives a timeline of goroutine lifecycle:

go test -trace trace.out
go tool trace trace.out

You'll see goroutines that:

never complete
block on channels
block on network I/O

4. Heap Growth

Leaked goroutines capture memory:

closure variables
buffers
context
network state

If heap grows in parallel with goroutines → confirmed leak.

🛠 Best Practices to Prevent Goroutine Leaks

1. Always use context with goroutines

go func(ctx context.Context) {
    select {
    case <-ctx.Done():
        return
    case v := <-ch:
        _ = v
    }
}(ctx)

2. Close channels when done

Producer closes, not consumer:

close(results)

3. Avoid unbounded goroutine creation

Use worker pools:

sem := make(chan struct{}, 10) // max 10 concurrent tasks

sem <- struct{}{}
go func() {
    defer func() { <-sem }()
    doWork()
}()

4. Use `select` with defaults to avoid deadlocks

select {
case jobs <- job:
default:
    log.Println("job queue full")
}

5. Timeouts for every external call

Never trust remote systems:

ctx, cancel := context.WithTimeout(context.Background(), 3*time.Second)
defer cancel()

6. Monitor goroutine count aggressively

Set alerts:

if go_goroutines > 2000 (and keeps increasing)
    alert: "Possible goroutine leak"

📌 Checklist: Before shipping Go code to production

✅ Do all goroutines have a way to exit?
✅ Are contexts cancelled?
✅ Are channels closed when done?
✅ Is concurrency bounded?
✅ Are there blocking operations inside goroutines?
✅ Do we monitor goroutine count?

🎯 Final Thoughts

Goroutine leaks are one of the most common hidden problems in Go microservices — especially under real production load.

By understanding how leaks happen and using proper patterns (context cancellation, closing channels, bounded concurrency), you can:

prevent memory bloat
avoid production outages
keep your services reliable and fast
confidently observe and debug concurrency issues

Happy debugging & happy coding! 🔥🐹

DEV Community

🧠 Detecting and Preventing Goroutine Leaks in Production (Leak Detection in Go)

❗ What Is a Goroutine Leak?

🐛 A Classic Example of a Goroutine Leak

⚠️ Real-World Leak Scenario: Cancelling Requests Without Cancelling Goroutines

🧯 Fixing the Leak with Cancellation Propagation

🕳️ Hidden Leak: Unbounded Goroutine Spawning

🧵 Leaks in Channels and Pipelines

Fix:

🔎 How to Detect Goroutine Leaks in Production

1. `runtime.NumGoroutine()`

2. pprof (Goroutine Profiles)

3. `go trace`

4. Heap Growth

🛠 Best Practices to Prevent Goroutine Leaks

1. Always use context with goroutines

2. Close channels when done

3. Avoid unbounded goroutine creation

4. Use `select` with defaults to avoid deadlocks

5. Timeouts for every external call

6. Monitor goroutine count aggressively

📌 Checklist: Before shipping Go code to production

🎯 Final Thoughts

Top comments (0)

❗ What Is a Goroutine Leak?

🐛 A Classic Example of a Goroutine Leak

⚠️ Real-World Leak Scenario: Cancelling Requests Without Cancelling Goroutines

🧯 Fixing the Leak with Cancellation Propagation

🕳️ Hidden Leak: Unbounded Goroutine Spawning

🧵 Leaks in Channels and Pipelines

Fix:

🔎 How to Detect Goroutine Leaks in Production

1. runtime.NumGoroutine()

2. pprof (Goroutine Profiles)

3. go trace

4. Heap Growth

🛠 Best Practices to Prevent Goroutine Leaks

1. Always use context with goroutines

2. Close channels when done

3. Avoid unbounded goroutine creation

4. Use select with defaults to avoid deadlocks

5. Timeouts for every external call

6. Monitor goroutine count aggressively

📌 Checklist: Before shipping Go code to production

🎯 Final Thoughts

1. `runtime.NumGoroutine()`

3. `go trace`

4. Use `select` with defaults to avoid deadlocks