DEV Community

Cover image for The 54-point production deployment checklist that saves you from 3am rollbacks
Fabrício Peloso
Fabrício Peloso

Posted on

The 54-point production deployment checklist that saves you from 3am rollbacks

You've been there.

A deploy that seemed fine. Then the error rate spikes. Then Slack blows up. Then you're doing a rollback at 2am, cold coffee in hand, explaining to your manager what went wrong.

Most production incidents aren't caused by bad code. They're caused by skipped verifications under pressure.


Why I built this

I come from industrial automation before moving to DevOps. In that world, systems run 24/7. Downtime isn't a Jira ticket — it's a financial event. You do not skip a step. Ever.

When I moved into cloud infrastructure, I was genuinely surprised at how informal most deployment processes are. Smart engineers, running entirely on mental checklists, under pressure, with no formal verification step.

So I built the checklist I wish I'd had from day one.


What's in it

54 verifications across 4 phases, structured so they appear at exactly the moment you need them:

Phase 1 — Pre-deployment (18 checks)

The checks that matter most:

  • PR approved by 2+ reviewers (not just one tired senior at EOD)
  • CI fully green: unit, integration, e2e, SAST scan
  • DB backup confirmed and tested — not just "the scheduled backup should have run"
  • Migrations tested on staging with rollback also tested
  • Rollback plan documented with estimated time under 10 minutes
  • Feature flags set to off-by-default in production
  • Monitoring and alerts confirmed active for the service

Phase 2 — Execution (14 checks)

  • Zero errors in the first 60 seconds — this is when problems are easiest to catch
  • Health checks passing on all replicas, not just the first few
  • P95 latency within SLA throughout the rollout
  • Feature flags enabled progressively: 5% → 25% → 100%

Phase 3 — Post-deployment (14 checks)

  • Error rate stable for 15 continuous minutes
  • Smoke tests on the critical user paths: login, main API, checkout
  • Business metrics normal — transactions/min, conversion rate, active users
  • 24-hour observation period with an assigned on-call engineer

Phase 4 — Rollback (8 checks)

If you're here, something went wrong. This phase is designed for the worst moments — when you're stressed, when everyone is watching, when every second costs money. The procedure is linear. You don't need to think. You just follow the steps.


What it looks like in practice

It's a single HTML file. Open it in any browser. Works offline.

Before each deploy:

  1. Fill in the service name, version, owner, environment, and maintenance window
  2. Work through the checklist, checking items as you go
  3. The progress bar shows where you are
  4. Critical items are visually flagged — you can't accidentally skip them
  5. When everything is checked, the status bar turns green: "All verifications complete. Deployment approved for production."
  6. Export a one-click .txt report for audit trail or post-mortem documentation

No SaaS. No subscription. No account. Just a file that opens and works.


The honest truth about checklists

Checklists work. The evidence is unambiguous.

Aviation reduced fatal accidents dramatically after standardizing pre-flight checklists. Surgical teams reduced complications significantly after implementing standardized surgical checklists. The same principle applies to software deployments.

The problem isn't that engineers don't know what to check. It's that under pressure, with context-switching and deadlines, the mental checklist gets compressed. Items get skipped. Usually nothing happens. Until it does.

A physical checklist removes the cognitive load from the critical moment. You're not trying to remember. You're just following a list.


Get it

The full checklist is available here: Production Deployment Checklist — $19

If it prevents one incident, it will have paid for itself several hundred times over.


If you have a verification I missed — things you've learned the hard way — drop them in the comments. I'll incorporate the best ones into the next version.


Tags: #devops #deployment #productivity #kubernetes #sre #cicd #infrastructure

Top comments (0)