<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Fabrício Peloso</title>
    <description>The latest articles on DEV Community by Fabrício Peloso (@fabrciowplima).</description>
    <link>https://dev.to/fabrciowplima</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3961154%2F41b1e678-01c2-4ff9-9df3-1937610805f1.jpg</url>
      <title>DEV Community: Fabrício Peloso</title>
      <link>https://dev.to/fabrciowplima</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/fabrciowplima"/>
    <language>en</language>
    <item>
      <title>The 54-point production deployment checklist that saves you from 3am rollbacks</title>
      <dc:creator>Fabrício Peloso</dc:creator>
      <pubDate>Sun, 31 May 2026 12:44:48 +0000</pubDate>
      <link>https://dev.to/fabrciowplima/the-54-point-production-deployment-checklist-that-saves-you-from-3am-rollbacks-22i4</link>
      <guid>https://dev.to/fabrciowplima/the-54-point-production-deployment-checklist-that-saves-you-from-3am-rollbacks-22i4</guid>
      <description>&lt;p&gt;You've been there.&lt;/p&gt;

&lt;p&gt;A deploy that seemed fine. Then the error rate spikes. Then Slack blows up. Then you're doing a rollback at 2am, cold coffee in hand, explaining to your manager what went wrong.&lt;/p&gt;

&lt;p&gt;Most production incidents aren't caused by bad code. They're caused by &lt;strong&gt;skipped verifications under pressure&lt;/strong&gt;.&lt;/p&gt;




&lt;h2&gt;
  
  
  Why I built this
&lt;/h2&gt;

&lt;p&gt;I come from industrial automation before moving to DevOps. In that world, systems run 24/7. Downtime isn't a Jira ticket — it's a financial event. You do not skip a step. Ever.&lt;/p&gt;

&lt;p&gt;When I moved into cloud infrastructure, I was genuinely surprised at how informal most deployment processes are. Smart engineers, running entirely on mental checklists, under pressure, with no formal verification step.&lt;/p&gt;

&lt;p&gt;So I built the checklist I wish I'd had from day one.&lt;/p&gt;




&lt;h2&gt;
  
  
  What's in it
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;54 verifications across 4 phases&lt;/strong&gt;, structured so they appear at exactly the moment you need them:&lt;/p&gt;

&lt;h3&gt;
  
  
  Phase 1 — Pre-deployment (18 checks)
&lt;/h3&gt;

&lt;p&gt;The checks that matter most:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;PR approved by 2+ reviewers (not just one tired senior at EOD)&lt;/li&gt;
&lt;li&gt;CI fully green: unit, integration, e2e, SAST scan&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;DB backup confirmed and tested&lt;/strong&gt; — not just "the scheduled backup should have run"&lt;/li&gt;
&lt;li&gt;Migrations tested on staging with rollback also tested&lt;/li&gt;
&lt;li&gt;Rollback plan documented with estimated time under 10 minutes&lt;/li&gt;
&lt;li&gt;Feature flags set to off-by-default in production&lt;/li&gt;
&lt;li&gt;Monitoring and alerts confirmed active for the service&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Phase 2 — Execution (14 checks)
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Zero errors in the &lt;strong&gt;first 60 seconds&lt;/strong&gt; — this is when problems are easiest to catch&lt;/li&gt;
&lt;li&gt;Health checks passing on &lt;strong&gt;all&lt;/strong&gt; replicas, not just the first few&lt;/li&gt;
&lt;li&gt;P95 latency within SLA throughout the rollout&lt;/li&gt;
&lt;li&gt;Feature flags enabled progressively: 5% → 25% → 100%&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Phase 3 — Post-deployment (14 checks)
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Error rate stable for &lt;strong&gt;15 continuous minutes&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;Smoke tests on the critical user paths: login, main API, checkout&lt;/li&gt;
&lt;li&gt;Business metrics normal — transactions/min, conversion rate, active users&lt;/li&gt;
&lt;li&gt;24-hour observation period with an assigned on-call engineer&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Phase 4 — Rollback (8 checks)
&lt;/h3&gt;

&lt;p&gt;If you're here, something went wrong. This phase is designed for the worst moments — when you're stressed, when everyone is watching, when every second costs money. The procedure is linear. You don't need to think. You just follow the steps.&lt;/p&gt;




&lt;h2&gt;
  
  
  What it looks like in practice
&lt;/h2&gt;

&lt;p&gt;It's a single HTML file. Open it in any browser. Works offline.&lt;/p&gt;

&lt;p&gt;Before each deploy:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Fill in the service name, version, owner, environment, and maintenance window&lt;/li&gt;
&lt;li&gt;Work through the checklist, checking items as you go&lt;/li&gt;
&lt;li&gt;The progress bar shows where you are&lt;/li&gt;
&lt;li&gt;Critical items are visually flagged — you can't accidentally skip them&lt;/li&gt;
&lt;li&gt;When everything is checked, the status bar turns green: &lt;em&gt;"All verifications complete. Deployment approved for production."&lt;/em&gt;
&lt;/li&gt;
&lt;li&gt;Export a one-click .txt report for audit trail or post-mortem documentation&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;No SaaS. No subscription. No account. Just a file that opens and works.&lt;/p&gt;




&lt;h2&gt;
  
  
  The honest truth about checklists
&lt;/h2&gt;

&lt;p&gt;Checklists work. The evidence is unambiguous.&lt;/p&gt;

&lt;p&gt;Aviation reduced fatal accidents dramatically after standardizing pre-flight checklists. Surgical teams reduced complications significantly after implementing standardized surgical checklists. The same principle applies to software deployments.&lt;/p&gt;

&lt;p&gt;The problem isn't that engineers don't know what to check. It's that under pressure, with context-switching and deadlines, the mental checklist gets compressed. Items get skipped. Usually nothing happens. Until it does.&lt;/p&gt;

&lt;p&gt;A physical checklist removes the cognitive load from the critical moment. You're not trying to remember. You're just following a list.&lt;/p&gt;




&lt;h2&gt;
  
  
  Get it
&lt;/h2&gt;

&lt;p&gt;The full checklist is available here: &lt;strong&gt;&lt;a href="https://fabriciowplima.gumroad.com/l/rdsnfc" rel="noopener noreferrer"&gt;Production Deployment Checklist — $19&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;If it prevents one incident, it will have paid for itself several hundred times over.&lt;/p&gt;




&lt;p&gt;If you have a verification I missed — things you've learned the hard way — drop them in the comments. I'll incorporate the best ones into the next version.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Tags: #devops #deployment #productivity #kubernetes #sre #cicd #infrastructure&lt;/em&gt;&lt;/p&gt;

</description>
      <category>devops</category>
      <category>kubernetes</category>
      <category>cicd</category>
      <category>sre</category>
    </item>
  </channel>
</rss>
