Your service just got a SIGTERM. You have roughly 30 seconds before Kubernetes sends SIGKILL. In that window, you need to finish in-flight requests, flush buffered data, close database connections, and deregister from service discovery — all without dropping a single user request.
Get it wrong and you get data loss, broken client connections, and 502s during every deploy. Get it right and your deployments become invisible to users.
This article walks through the patterns that make graceful shutdown reliable in production Go services.
Why Graceful Shutdown Matters
Three things go wrong when a service dies abruptly:
Data loss. Buffered writes — log batches, metrics, queue messages — vanish. Database transactions in progress get rolled back (best case) or leave inconsistent state (worst case).
Connection drops. In-flight HTTP requests get TCP RST. gRPC streams break mid-message. WebSocket clients see unexpected disconnections. Depending on client retry logic, this can cascade.
Load balancer draining failures. Kubernetes removes the pod from the Service endpoints, but if your process exits before the kubelet propagates that change, the load balancer still sends traffic to a dead pod. The standard fix is to keep serving for a few seconds after receiving SIGTERM — but only if your shutdown sequence is correct.
The Foundation: os.Signal + context.Context
Every graceful shutdown in Go starts with the same two primitives: signal notification and context cancellation.
func main() {
// Create a context that cancels on SIGINT or SIGTERM
ctx, stop := signal.NotifyContext(context.Background(), syscall.SIGINT, syscall.SIGTERM)
defer stop()
// Pass ctx to everything that needs to know about shutdown
if err := run(ctx); err != nil {
log.Fatalf("service exited with error: %v", err)
}
}
signal.NotifyContext (added in Go 1.16) is the cleanest way to wire OS signals into the context tree. When the signal arrives, the context's Done() channel closes, and every goroutine watching that context knows it's time to wrap up.
The key principle: propagate context everywhere. Every HTTP handler, every database query, every background worker should accept a context.Context. This is how you make shutdown cooperative rather than forceful.
HTTP Server Graceful Shutdown
http.Server.Shutdown() does exactly what you want: it stops accepting new connections, waits for in-flight requests to complete, and then returns.
func runHTTPServer(ctx context.Context) error {
mux := http.NewServeMux()
mux.HandleFunc("/api/data", handleData)
srv := &http.Server{
Addr: ":8080",
Handler: mux,
}
// Start server in a goroutine
errCh := make(chan error, 1)
go func() {
if err := srv.ListenAndServe(); err != http.ErrServerClosed {
errCh <- err
}
close(errCh)
}()
// Wait for shutdown signal
<-ctx.Done()
log.Println("shutting down HTTP server...")
// Give in-flight requests a deadline to finish
shutdownCtx, cancel := context.WithTimeout(context.Background(), 10*time.Second)
defer cancel()
if err := srv.Shutdown(shutdownCtx); err != nil {
return fmt.Errorf("http server shutdown: %w", err)
}
// Check if server goroutine hit an error before shutdown
return <-errCh
}
Notice the shutdown context uses context.Background(), not the already-cancelled parent context. This is a common mistake — if you derive the shutdown timeout from the cancelled context, the timeout is already expired and Shutdown() returns immediately, dropping in-flight requests.
gRPC Graceful Stop
gRPC has its own graceful shutdown method that mirrors the HTTP pattern:
func runGRPCServer(ctx context.Context) error {
lis, err := net.Listen("tcp", ":9090")
if err != nil {
return fmt.Errorf("listen: %w", err)
}
grpcServer := grpc.NewServer()
pb.RegisterMyServiceServer(grpcServer, &myService{})
// Enable reflection for debugging
reflection.Register(grpcServer)
errCh := make(chan error, 1)
go func() {
if err := grpcServer.Serve(lis); err != nil {
errCh <- err
}
close(errCh)
}()
<-ctx.Done()
log.Println("shutting down gRPC server...")
// GracefulStop waits for active RPCs to finish
// but we add a hard deadline as a safety net
stopped := make(chan struct{})
go func() {
grpcServer.GracefulStop()
close(stopped)
}()
select {
case <-stopped:
log.Println("gRPC server stopped gracefully")
case <-time.After(10 * time.Second):
log.Println("gRPC server force stop (timeout)")
grpcServer.Stop()
}
return <-errCh
}
GracefulStop() has no built-in timeout, so you must wrap it with a deadline yourself. Without this, a single slow-draining stream can hang your shutdown indefinitely.
Worker Pool Draining with sync.WaitGroup
Background workers — queue consumers, cron jobs, batch processors — need a different pattern. The worker checks ctx.Done() between units of work, and a WaitGroup lets the main goroutine wait for all workers to finish their current task.
func runWorkerPool(ctx context.Context, queue <-chan Job) error {
const numWorkers = 8
var wg sync.WaitGroup
for i := 0; i < numWorkers; i++ {
wg.Add(1)
go func(id int) {
defer wg.Done()
for {
select {
case <-ctx.Done():
log.Printf("worker %d: shutting down", id)
return
case job, ok := <-queue:
if !ok {
return // channel closed
}
if err := processJob(ctx, job); err != nil {
log.Printf("worker %d: job failed: %v", id, err)
}
}
}
}(i)
}
// Wait for all workers to finish their current job
wg.Wait()
return nil
}
The critical detail: processJob receives the context so it can abandon long-running work. If a job takes 5 minutes and your shutdown budget is 30 seconds, the job needs to respect context cancellation or you'll hit SIGKILL.
Database Connection Cleanup
Database connections are the easiest to get right and the most painful to get wrong. Leaked connections exhaust the connection pool on the database side, causing failures across all services.
func setupDatabase(ctx context.Context) (*sql.DB, error) {
db, err := sql.Open("postgres", os.Getenv("DATABASE_URL"))
if err != nil {
return nil, err
}
db.SetMaxOpenConns(25)
db.SetMaxIdleConns(5)
db.SetConnMaxLifetime(5 * time.Minute)
if err := db.PingContext(ctx); err != nil {
db.Close()
return nil, fmt.Errorf("database ping: %w", err)
}
return db, nil
}
The cleanup is simple — db.Close() — but the ordering matters. Close the database after servers and workers have stopped, since they may still be executing queries during their drain period.
Composing Everything with errgroup
Now let's stitch it all together. errgroup from golang.org/x/sync gives us structured concurrency: run components in parallel, shut them all down if any one fails, and collect errors.
func run(ctx context.Context) error {
db, err := setupDatabase(ctx)
if err != nil {
return fmt.Errorf("database: %w", err)
}
defer db.Close() // Always last to close
jobQueue := make(chan Job, 100)
g, ctx := errgroup.WithContext(ctx)
// HTTP server
g.Go(func() error {
return runHTTPServer(ctx)
})
// gRPC server
g.Go(func() error {
return runGRPCServer(ctx)
})
// Worker pool
g.Go(func() error {
return runWorkerPool(ctx, jobQueue)
})
// Health monitor (optional: triggers shutdown on critical failures)
g.Go(func() error {
return runHealthMonitor(ctx, db)
})
log.Println("service started")
err = g.Wait()
log.Println("all components stopped")
return err
}
errgroup.WithContext creates a derived context that cancels when any goroutine in the group returns an error. This means if the HTTP server fails to bind its port, the context cancels, and the gRPC server and workers also begin shutting down. Unified lifecycle management.
Note that db.Close() runs via defer — it executes after g.Wait() returns, guaranteeing all servers and workers have fully drained before we close database connections. Dependency ordering through deferred cleanup is the simplest correct pattern.
Health Check Integration with Kubernetes
Kubernetes needs to know when your pod is ready to receive traffic and when it should stop sending traffic. The readiness probe is your coordination point.
type healthHandler struct {
ready atomic.Bool
}
func (h *healthHandler) ServeHTTP(w http.ResponseWriter, r *http.Request) {
if h.ready.Load() {
w.WriteHeader(http.StatusOK)
w.Write([]byte("ok"))
} else {
w.WriteHeader(http.StatusServiceUnavailable)
w.Write([]byte("shutting down"))
}
}
func runHTTPServer(ctx context.Context) error {
health := &healthHandler{}
health.ready.Store(true)
mux := http.NewServeMux()
mux.Handle("/healthz", health)
mux.HandleFunc("/api/data", handleData)
srv := &http.Server{Addr: ":8080", Handler: mux}
errCh := make(chan error, 1)
go func() {
if err := srv.ListenAndServe(); err != http.ErrServerClosed {
errCh <- err
}
close(errCh)
}()
<-ctx.Done()
// Step 1: Mark as not ready — Kubernetes stops sending traffic
health.ready.Store(false)
// Step 2: Wait for load balancer to propagate the change
time.Sleep(5 * time.Second)
// Step 3: Now shut down the server
shutdownCtx, cancel := context.WithTimeout(context.Background(), 10*time.Second)
defer cancel()
if err := srv.Shutdown(shutdownCtx); err != nil {
return fmt.Errorf("http server shutdown: %w", err)
}
return <-errCh
}
That time.Sleep(5 * time.Second) looks wrong, but it's essential. Kubernetes endpoint propagation is eventually consistent. After your readiness probe starts failing, it takes a few seconds for kube-proxy/iptables rules to update across nodes. During that window, traffic still arrives. The sleep keeps your server alive to handle it.
In your pod spec:
readinessProbe:
httpGet:
path: /healthz
port: 8080
periodSeconds: 2
failureThreshold: 1
terminationGracePeriodSeconds: 45
Set terminationGracePeriodSeconds to comfortably exceed your total shutdown budget (sleep + drain timeout + cleanup).
Common Mistakes
1. Deriving shutdown context from the cancelled parent.
// WRONG: ctx is already cancelled, timeout has no effect
shutdownCtx, cancel := context.WithTimeout(ctx, 10*time.Second)
// RIGHT: fresh context with its own deadline
shutdownCtx, cancel := context.WithTimeout(context.Background(), 10*time.Second)
2. No shutdown timeout at all. GracefulStop() and custom drain loops can block forever. Always wrap with a deadline and fall back to a forced stop.
3. Wrong cleanup ordering. If you close the database before the HTTP server finishes draining, in-flight request handlers get connection errors. Rule of thumb: shut down in reverse order of initialization. Servers first, then workers, then infrastructure (DB, caches, message brokers).
4. Forgetting to stop timers and tickers.
ticker := time.NewTicker(30 * time.Second)
defer ticker.Stop() // Don't leak this
for {
select {
case <-ctx.Done():
return nil
case <-ticker.C:
doPeriodicWork()
}
}
5. Not testing shutdown. Send SIGTERM to your service in integration tests. Verify that in-flight requests complete, no errors appear in logs, and the process exits with code 0.
Putting It All Together
The complete pattern:
- Trap
SIGINT/SIGTERMwithsignal.NotifyContext - Start all components under an
errgroupwith the signal-aware context - Each component watches
ctx.Done()and begins draining - Mark readiness probe as unhealthy, sleep for LB propagation
- Shut down servers (HTTP, gRPC) with a timeout
- Wait for workers to finish current tasks
- Close infrastructure connections (DB, caches) last via
defer
This sequence handles the normal deploy case (Kubernetes rolling update), the crash case (a component fails and triggers group shutdown), and the manual stop case (Ctrl+C in development).
Graceful shutdown isn't glamorous, but it's the difference between deploys that are invisible to users and deploys that generate a spike of errors in your dashboard. Build it in from the start — retrofitting it later is always harder.
This is part of the **Production Backend Patterns* series. Next up: structured logging and distributed tracing in Go.*
If this article helped you, consider buying me a coffee on Ko-fi! Follow me for more production backend patterns.
Top comments (0)