DEV Community

Benjamin Cane
Benjamin Cane

Posted on • Originally published at itnext.io on

If every transaction matters, you should understand graceful shutdown

If every transaction matters in your system, then you should take time to understand how to implement Graceful Shutdown properly.

Maximizing Resilience with Graceful Shutdown in Cloud-Native Golang Applications

🤔 Why does graceful shutdown matter?

Without some coordination, in-flight transactions can be lost, partially processed, or worse.

Let’s walk through what happens when we shutdown a containerized application.

🛑 Container shutdown

When a scheduler like Kubernetes attempts to stop a container, the application running within the container is sent a POSIX signal.

In this case, a SIGTERM signal.

This SIGTERM is the kernel’s way of communicating to the application that it should terminate.

If there is no implementation to capture (trap) the signal, it will stop immediately.

If there are any in-flight requests during that time, they are not responded to.

If there are open database connections or open file handlers, they are not closed.

Everything stops, and problems manifest.

😭 Common issues

Aside transaction timeouts, I’ve seen other issues caused by not implementing graceful shutdown correctly.

One of the most common is exceeding database connections.

Limiting the number of connections a single user can make is very common, especially for shared database infrastructure.

If connections aren’t closed properly, they linger, and are still counted as used.

Lingering connections can prevent new instances from connecting. Leaving your applications in a state where they are running, but not connected to a database.

I’ve seen similar issues with open file handlers, lingering files, fully using the ephemeral port range with lingering TCP sessions, and more.

Not cleanly shutting down an application can lead to platform instability.

🙌 How to implement Graceful Shutdown

The first step is to trap the signal.

Different languages implement this differently, but it’s as simple as listening on a channel in Go.

Once you trap the SIGTERM, how you shutdown is important.

Don’t just start closing connections; in-flight requests will be impacted.

You need to redirect traffic from the instance and then shutdown.

Readiness checks are my go-to approach for triggering traffic redirection.

The key is that you need to redirect traffic away from the instance and wait for requests to finish.

Once the requests are complete, you can safely close connections, file handlers, etc. and stop the application.

👷 Graceful shutdown is a lot of work

Implementing graceful shutdown might sound like a lot of work, and it is. However, the return is a stable platform.

Implementing this process is critical when adopting Kubernetes, where pods can be evacuated for maintenance or resource scheduling at any time.

With container runtime platforms, you must assume your application will be shutdown at any time. When the scheduler stops your application, the shutdown should be clean.


Top comments (0)