The most dangerous moment in your system is a normal deploy

Your service's most dangerous moment isn't when it crashes.

It's a normal deploy.

When you redeploy, the old process gets a SIGTERM — a polite "please stop."

By default, Node hears that and dies instantly. Whatever it was doing gets cut mid-sentence.

Most of the time you get away with it. Until the one request that mattered.

Here's the thing: SIGTERM is a request, not an execution order. You usually have a few seconds before the system forces you out.

Graceful shutdown is just using those seconds — stop taking new work, finish what's already running, close your connections, then exit.

I learned this the hard way on a ticketing app that issues refunds through Stripe.

A refund runs as a background job: call Stripe, then record it in our database.

Deploy at the wrong moment without graceful shutdown, and the process dies between those two steps.

Money leaves Stripe. Our database never finds out.

Now a customer is refunded in our system's blind spot — and no log even knows it happened.

With graceful shutdown, the deploy waits for that job to finish first. Boring. Which is exactly the goal.

The takeaway: treat shutdown as a feature, not an afterthought.

Catch SIGTERM, drain in-flight work, and — the part everyone forgets — set your process manager's kill timeout long enough to actually let it finish. A 2-second grace on a 10-second job is the same as no grace at all.

What's the worst thing a "routine" deploy has quietly broken for you?

#SystemDesign #SoftwareEngineering #BackendDevelopment #Reliability #DevOps #NodeJS

DEV Community

The most dangerous moment in your system is a normal deploy

Top comments (0)