DEV Community

Young Gao
Young Gao

Posted on • Originally published at younggao.hashnode.dev

Graceful Shutdown in Go: Patterns Every Production Service Needs

Your service just got a SIGTERM. You have roughly 30 seconds before Kubernetes sends SIGKILL. In that window, you need to finish in-flight requests, flush buffered data, close database connections, and deregister from service discovery — all without dropping a single user request.

Get it wrong and you get data loss, broken client connections, and 502s during every deploy. Get it right and your deployments become invisible to users.

This article walks through the patterns that make graceful shutdown reliable in production Go services.

Why Graceful Shutdown Matters

Three things go wrong when a service dies abruptly:

Data loss. Buffered writes — log batches, metrics, queue messages — vanish. Database transactions in progress get rolled back (best case) or leave inconsistent state (worst case).

Connection drops. In-flight HTTP requests get TCP RST. gRPC streams break mid-message. WebSocket clients see unexpected disconnections. Depending on client retry logic, this can cascade.

Load balancer draining failures. Kubernetes removes the pod from the Service endpoints, but if your process exits before the kubelet propagates that change, the load balancer still sends traffic to a dead pod. The standard fix is to keep serving for a few seconds after receiving SIGTERM — but only if your shutdown sequence is correct.

The Foundation: os.Signal + context.Context

Every graceful shutdown in Go starts with the same two primitives: signal notification and context cancellation.

func main() {
    // Create a context that cancels on SIGINT or SIGTERM
    ctx, stop := signal.NotifyContext(context.Background(), syscall.SIGINT, syscall.SIGTERM)
    defer stop()

    // Pass ctx to everything that needs to know about shutdown
    if err := run(ctx); err != nil {
        log.Fatalf("service exited with error: %v", err)
    }
}
Enter fullscreen mode Exit fullscreen mode

signal.NotifyContext (added in Go 1.16) is the cleanest way to wire OS signals into the context tree. When the signal arrives, the context's Done() channel closes, and every goroutine watching that context knows it's time to wrap up.

The key principle: propagate context everywhere. Every HTTP handler, every database query, every background worker should accept a context.Context. This is how you make shutdown cooperative rather than forceful.

HTTP Server Graceful Shutdown

http.Server.Shutdown() does exactly what you want: it stops accepting new connections, waits for in-flight requests to complete, and then returns.

func runHTTPServer(ctx context.Context) error {
    mux := http.NewServeMux()
    mux.HandleFunc("/api/data", handleData)

    srv := &http.Server{
        Addr:    ":8080",
        Handler: mux,
    }

    // Start server in a goroutine
    errCh := make(chan error, 1)
    go func() {
        if err := srv.ListenAndServe(); err != http.ErrServerClosed {
            errCh <- err
        }
        close(errCh)
    }()

    // Wait for shutdown signal
    <-ctx.Done()
    log.Println("shutting down HTTP server...")

    // Give in-flight requests a deadline to finish
    shutdownCtx, cancel := context.WithTimeout(context.Background(), 10*time.Second)
    defer cancel()

    if err := srv.Shutdown(shutdownCtx); err != nil {
        return fmt.Errorf("http server shutdown: %w", err)
    }

    // Check if server goroutine hit an error before shutdown
    return <-errCh
}
Enter fullscreen mode Exit fullscreen mode

Notice the shutdown context uses context.Background(), not the already-cancelled parent context. This is a common mistake — if you derive the shutdown timeout from the cancelled context, the timeout is already expired and Shutdown() returns immediately, dropping in-flight requests.

gRPC Graceful Stop

gRPC has its own graceful shutdown method that mirrors the HTTP pattern:

func runGRPCServer(ctx context.Context) error {
    lis, err := net.Listen("tcp", ":9090")
    if err != nil {
        return fmt.Errorf("listen: %w", err)
    }

    grpcServer := grpc.NewServer()
    pb.RegisterMyServiceServer(grpcServer, &myService{})

    // Enable reflection for debugging
    reflection.Register(grpcServer)

    errCh := make(chan error, 1)
    go func() {
        if err := grpcServer.Serve(lis); err != nil {
            errCh <- err
        }
        close(errCh)
    }()

    <-ctx.Done()
    log.Println("shutting down gRPC server...")

    // GracefulStop waits for active RPCs to finish
    // but we add a hard deadline as a safety net
    stopped := make(chan struct{})
    go func() {
        grpcServer.GracefulStop()
        close(stopped)
    }()

    select {
    case <-stopped:
        log.Println("gRPC server stopped gracefully")
    case <-time.After(10 * time.Second):
        log.Println("gRPC server force stop (timeout)")
        grpcServer.Stop()
    }

    return <-errCh
}
Enter fullscreen mode Exit fullscreen mode

GracefulStop() has no built-in timeout, so you must wrap it with a deadline yourself. Without this, a single slow-draining stream can hang your shutdown indefinitely.

Worker Pool Draining with sync.WaitGroup

Background workers — queue consumers, cron jobs, batch processors — need a different pattern. The worker checks ctx.Done() between units of work, and a WaitGroup lets the main goroutine wait for all workers to finish their current task.

func runWorkerPool(ctx context.Context, queue <-chan Job) error {
    const numWorkers = 8
    var wg sync.WaitGroup

    for i := 0; i < numWorkers; i++ {
        wg.Add(1)
        go func(id int) {
            defer wg.Done()
            for {
                select {
                case <-ctx.Done():
                    log.Printf("worker %d: shutting down", id)
                    return
                case job, ok := <-queue:
                    if !ok {
                        return // channel closed
                    }
                    if err := processJob(ctx, job); err != nil {
                        log.Printf("worker %d: job failed: %v", id, err)
                    }
                }
            }
        }(i)
    }

    // Wait for all workers to finish their current job
    wg.Wait()
    return nil
}
Enter fullscreen mode Exit fullscreen mode

The critical detail: processJob receives the context so it can abandon long-running work. If a job takes 5 minutes and your shutdown budget is 30 seconds, the job needs to respect context cancellation or you'll hit SIGKILL.

Database Connection Cleanup

Database connections are the easiest to get right and the most painful to get wrong. Leaked connections exhaust the connection pool on the database side, causing failures across all services.

func setupDatabase(ctx context.Context) (*sql.DB, error) {
    db, err := sql.Open("postgres", os.Getenv("DATABASE_URL"))
    if err != nil {
        return nil, err
    }

    db.SetMaxOpenConns(25)
    db.SetMaxIdleConns(5)
    db.SetConnMaxLifetime(5 * time.Minute)

    if err := db.PingContext(ctx); err != nil {
        db.Close()
        return nil, fmt.Errorf("database ping: %w", err)
    }

    return db, nil
}
Enter fullscreen mode Exit fullscreen mode

The cleanup is simple — db.Close() — but the ordering matters. Close the database after servers and workers have stopped, since they may still be executing queries during their drain period.

Composing Everything with errgroup

Now let's stitch it all together. errgroup from golang.org/x/sync gives us structured concurrency: run components in parallel, shut them all down if any one fails, and collect errors.

func run(ctx context.Context) error {
    db, err := setupDatabase(ctx)
    if err != nil {
        return fmt.Errorf("database: %w", err)
    }
    defer db.Close() // Always last to close

    jobQueue := make(chan Job, 100)

    g, ctx := errgroup.WithContext(ctx)

    // HTTP server
    g.Go(func() error {
        return runHTTPServer(ctx)
    })

    // gRPC server
    g.Go(func() error {
        return runGRPCServer(ctx)
    })

    // Worker pool
    g.Go(func() error {
        return runWorkerPool(ctx, jobQueue)
    })

    // Health monitor (optional: triggers shutdown on critical failures)
    g.Go(func() error {
        return runHealthMonitor(ctx, db)
    })

    log.Println("service started")
    err = g.Wait()
    log.Println("all components stopped")

    return err
}
Enter fullscreen mode Exit fullscreen mode

errgroup.WithContext creates a derived context that cancels when any goroutine in the group returns an error. This means if the HTTP server fails to bind its port, the context cancels, and the gRPC server and workers also begin shutting down. Unified lifecycle management.

Note that db.Close() runs via defer — it executes after g.Wait() returns, guaranteeing all servers and workers have fully drained before we close database connections. Dependency ordering through deferred cleanup is the simplest correct pattern.

Health Check Integration with Kubernetes

Kubernetes needs to know when your pod is ready to receive traffic and when it should stop sending traffic. The readiness probe is your coordination point.

type healthHandler struct {
    ready atomic.Bool
}

func (h *healthHandler) ServeHTTP(w http.ResponseWriter, r *http.Request) {
    if h.ready.Load() {
        w.WriteHeader(http.StatusOK)
        w.Write([]byte("ok"))
    } else {
        w.WriteHeader(http.StatusServiceUnavailable)
        w.Write([]byte("shutting down"))
    }
}

func runHTTPServer(ctx context.Context) error {
    health := &healthHandler{}
    health.ready.Store(true)

    mux := http.NewServeMux()
    mux.Handle("/healthz", health)
    mux.HandleFunc("/api/data", handleData)

    srv := &http.Server{Addr: ":8080", Handler: mux}

    errCh := make(chan error, 1)
    go func() {
        if err := srv.ListenAndServe(); err != http.ErrServerClosed {
            errCh <- err
        }
        close(errCh)
    }()

    <-ctx.Done()

    // Step 1: Mark as not ready — Kubernetes stops sending traffic
    health.ready.Store(false)

    // Step 2: Wait for load balancer to propagate the change
    time.Sleep(5 * time.Second)

    // Step 3: Now shut down the server
    shutdownCtx, cancel := context.WithTimeout(context.Background(), 10*time.Second)
    defer cancel()

    if err := srv.Shutdown(shutdownCtx); err != nil {
        return fmt.Errorf("http server shutdown: %w", err)
    }

    return <-errCh
}
Enter fullscreen mode Exit fullscreen mode

That time.Sleep(5 * time.Second) looks wrong, but it's essential. Kubernetes endpoint propagation is eventually consistent. After your readiness probe starts failing, it takes a few seconds for kube-proxy/iptables rules to update across nodes. During that window, traffic still arrives. The sleep keeps your server alive to handle it.

In your pod spec:

readinessProbe:
  httpGet:
    path: /healthz
    port: 8080
  periodSeconds: 2
  failureThreshold: 1
terminationGracePeriodSeconds: 45
Enter fullscreen mode Exit fullscreen mode

Set terminationGracePeriodSeconds to comfortably exceed your total shutdown budget (sleep + drain timeout + cleanup).

Common Mistakes

1. Deriving shutdown context from the cancelled parent.

// WRONG: ctx is already cancelled, timeout has no effect
shutdownCtx, cancel := context.WithTimeout(ctx, 10*time.Second)

// RIGHT: fresh context with its own deadline
shutdownCtx, cancel := context.WithTimeout(context.Background(), 10*time.Second)
Enter fullscreen mode Exit fullscreen mode

2. No shutdown timeout at all. GracefulStop() and custom drain loops can block forever. Always wrap with a deadline and fall back to a forced stop.

3. Wrong cleanup ordering. If you close the database before the HTTP server finishes draining, in-flight request handlers get connection errors. Rule of thumb: shut down in reverse order of initialization. Servers first, then workers, then infrastructure (DB, caches, message brokers).

4. Forgetting to stop timers and tickers.

ticker := time.NewTicker(30 * time.Second)
defer ticker.Stop() // Don't leak this

for {
    select {
    case <-ctx.Done():
        return nil
    case <-ticker.C:
        doPeriodicWork()
    }
}
Enter fullscreen mode Exit fullscreen mode

5. Not testing shutdown. Send SIGTERM to your service in integration tests. Verify that in-flight requests complete, no errors appear in logs, and the process exits with code 0.

Putting It All Together

The complete pattern:

  1. Trap SIGINT/SIGTERM with signal.NotifyContext
  2. Start all components under an errgroup with the signal-aware context
  3. Each component watches ctx.Done() and begins draining
  4. Mark readiness probe as unhealthy, sleep for LB propagation
  5. Shut down servers (HTTP, gRPC) with a timeout
  6. Wait for workers to finish current tasks
  7. Close infrastructure connections (DB, caches) last via defer

This sequence handles the normal deploy case (Kubernetes rolling update), the crash case (a component fails and triggers group shutdown), and the manual stop case (Ctrl+C in development).

Graceful shutdown isn't glamorous, but it's the difference between deploys that are invisible to users and deploys that generate a spike of errors in your dashboard. Build it in from the start — retrofitting it later is always harder.


This is part of the **Production Backend Patterns* series. Next up: structured logging and distributed tracing in Go.*


If this article helped you, consider buying me a coffee on Ko-fi! Follow me for more production backend patterns.

Top comments (0)