<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Aashish Bajpai</title>
    <description>The latest articles on DEV Community by Aashish Bajpai (@aashu320).</description>
    <link>https://dev.to/aashu320</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3989578%2F7ce35c6c-8122-47f9-9f88-0b4822aa8a57.jpg</url>
      <title>DEV Community: Aashish Bajpai</title>
      <link>https://dev.to/aashu320</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/aashu320"/>
    <language>en</language>
    <item>
      <title>5-Minute Post-Deploy Postmortem with SignalPilot</title>
      <dc:creator>Aashish Bajpai</dc:creator>
      <pubDate>Wed, 17 Jun 2026 17:44:23 +0000</pubDate>
      <link>https://dev.to/aashu320/5-minute-post-deploy-postmortem-with-signalpilot-32je</link>
      <guid>https://dev.to/aashu320/5-minute-post-deploy-postmortem-with-signalpilot-32je</guid>
      <description>&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Field Notes #5 · TL;DR&lt;/strong&gt; — &lt;a href="https://github.com/perfsage/signalpilot/releases/tag/v1.0.0" rel="noopener noreferrer"&gt;SignalPilot v1.0&lt;/a&gt; is live. Install with &lt;code&gt;pip install perfsage-signalpilot&lt;/code&gt;, apply read-only RBAC, run &lt;code&gt;signalpilot analyze&lt;/code&gt; — get a ranked HTML report with cited evidence and copy-paste &lt;code&gt;kubectl&lt;/code&gt; fixes in under five minutes. Not another dashboard. &lt;strong&gt;Analysis you can act on.&lt;/strong&gt; &lt;a href="https://perfsage.com/signalpilot/" rel="noopener noreferrer"&gt;Landing page&lt;/a&gt; · &lt;a href="https://github.com/perfsage/signalpilot/blob/main/examples/sample-report.html" rel="noopener noreferrer"&gt;Sample report&lt;/a&gt;.&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h2&gt;
  
  
  The MTTR gap nobody talks about
&lt;/h2&gt;

&lt;p&gt;Deploy reviews often fail on one question:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;em&gt;"Why did errors spike after my last deployment?"&lt;/em&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Not "what's the error rate?" — you can see that in Grafana. The hard part is &lt;strong&gt;defensible correlation&lt;/strong&gt;: linking OOMKilled on pod &lt;code&gt;api-7f3c&lt;/code&gt; to a memory limit change in the deploy diff, a new log fingerprint, and optionally the git commit that touched the heap allocator.&lt;/p&gt;

&lt;p&gt;That correlation used to cost me &lt;strong&gt;2–3 hours&lt;/strong&gt; of tab-switching. SignalPilot targets &lt;strong&gt;under five minutes&lt;/strong&gt; for typical post-deploy regressions.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Stage&lt;/th&gt;
&lt;th&gt;Manual war room&lt;/th&gt;
&lt;th&gt;SignalPilot&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;T+0&lt;/td&gt;
&lt;td&gt;Deploy completes&lt;/td&gt;
&lt;td&gt;Deploy completes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;T+5 min&lt;/td&gt;
&lt;td&gt;Someone opens kubectl&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;signalpilot analyze&lt;/code&gt; starts collectors&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;T+20 min&lt;/td&gt;
&lt;td&gt;Grafana dashboard shared&lt;/td&gt;
&lt;td&gt;Deploy diff + events + metrics fused&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;T+60 min&lt;/td&gt;
&lt;td&gt;"Maybe it's memory?"&lt;/td&gt;
&lt;td&gt;Ranked finding: &lt;code&gt;oom_killed&lt;/code&gt; with evidence&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;T+120 min&lt;/td&gt;
&lt;td&gt;Still debating rollback&lt;/td&gt;
&lt;td&gt;Copy-paste &lt;code&gt;kubectl&lt;/code&gt; fix on screen&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;T+180 min&lt;/td&gt;
&lt;td&gt;Postmortem doc started&lt;/td&gt;
&lt;td&gt;HTML report exported; gate ready for CI&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;




&lt;h2&gt;
  
  
  Install (v1.0.0)
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;pip &lt;span class="nb"&gt;install &lt;/span&gt;perfsage-signalpilot

kubectl apply &lt;span class="nt"&gt;-f&lt;/span&gt; https://raw.githubusercontent.com/perfsage/signalpilot/v1.0.0/deploy/signalpilot-rbac.yaml

signalpilot analyze my-namespace &lt;span class="nt"&gt;--deployment&lt;/span&gt; my-app &lt;span class="nt"&gt;--output&lt;/span&gt; report.html
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Preview output without a cluster: &lt;a href="https://github.com/perfsage/signalpilot/blob/main/examples/sample-report.html" rel="noopener noreferrer"&gt;sample HTML report on GitHub&lt;/a&gt;.&lt;/p&gt;




&lt;h2&gt;
  
  
  Walkthrough: &lt;code&gt;oom_killed&lt;/code&gt; after deploy
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Symptom:&lt;/strong&gt; Error rate jumps after a deploy. Pods restarting.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What SignalPilot correlates:&lt;/strong&gt;&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Signal source&lt;/th&gt;
&lt;th&gt;Evidence&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;K8s API&lt;/td&gt;
&lt;td&gt;Container &lt;code&gt;app&lt;/code&gt; OOMKilled, 4 restarts in 10 min&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;metrics-server&lt;/td&gt;
&lt;td&gt;Memory working-set at 96% of limit&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Deploy diff&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;resources.limits.memory&lt;/code&gt; changed 512Mi → 256Mi&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Logs&lt;/td&gt;
&lt;td&gt;New fingerprint: &lt;code&gt;java.lang.OutOfMemoryError: Java heap space&lt;/code&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;Rule fired:&lt;/strong&gt; &lt;code&gt;oom_killed&lt;/code&gt; — confidence ranked HIGH.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Recommended fix (copy-paste from report):&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;kubectl &lt;span class="nb"&gt;set &lt;/span&gt;resources deployment/my-app &lt;span class="nt"&gt;-n&lt;/span&gt; my-namespace &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--limits&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nv"&gt;memory&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;512Mi &lt;span class="nt"&gt;--requests&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nv"&gt;memory&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;256Mi
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Each finding cites &lt;strong&gt;multiple signal types&lt;/strong&gt; — not a single chart anomaly. That's the difference from staring at one Grafana panel.&lt;/p&gt;




&lt;h2&gt;
  
  
  CI gate: catch regressions before traffic fully shifts
&lt;/h2&gt;

&lt;p&gt;Complement load-test SLO gates from &lt;a href="https://perfsage.com/slo-plugin/" rel="noopener noreferrer"&gt;SLO Reporter&lt;/a&gt; with a post-deploy sanity check:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;signalpilot gate my-namespace &lt;span class="nt"&gt;--deployment&lt;/span&gt; my-app &lt;span class="nt"&gt;--junit-xml&lt;/span&gt; results.xml
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;GitHub Actions example:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Post-deploy RCA gate&lt;/span&gt;
  &lt;span class="na"&gt;run&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;|&lt;/span&gt;
    &lt;span class="s"&gt;pip install perfsage-signalpilot&lt;/span&gt;
    &lt;span class="s"&gt;signalpilot gate production-namespace \&lt;/span&gt;
      &lt;span class="s"&gt;--deployment api \&lt;/span&gt;
      &lt;span class="s"&gt;--junit-xml signalpilot-results.xml&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Exits non-zero on HIGH+ findings — same severity model as your SLO gates, different signal layer.&lt;/p&gt;




&lt;h2&gt;
  
  
  Deterministic rules first, optional LLM polish
&lt;/h2&gt;

&lt;p&gt;I'm not building "AI that fixes prod." SignalPilot's core RCA runs &lt;strong&gt;deterministic rules&lt;/strong&gt; — &lt;code&gt;oom_killed&lt;/code&gt;, &lt;code&gt;cpu_throttled&lt;/code&gt;, &lt;code&gt;crash_loop&lt;/code&gt;, &lt;code&gt;image_pull_error&lt;/code&gt;, &lt;code&gt;probe_failure&lt;/code&gt;, &lt;code&gt;code_regression&lt;/code&gt;, and more. Optional LLM narrative polish is there if you want it; &lt;strong&gt;no API key required&lt;/strong&gt; for ranked findings and kubectl recommendations.&lt;/p&gt;




&lt;h2&gt;
  
  
  The PerfSage ladder: test → gate → RCA
&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;&lt;a href="https://perfsage.com/reveal/" rel="noopener noreferrer"&gt;Reveal&lt;/a&gt;&lt;/strong&gt; — JMeter JTL analysis in the lab&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;a href="https://perfsage.com/slo-plugin/" rel="noopener noreferrer"&gt;SLO Reporter&lt;/a&gt;&lt;/strong&gt; — CI gates on load tests&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;a href="https://perfsage.com/signalpilot/" rel="noopener noreferrer"&gt;SignalPilot&lt;/a&gt;&lt;/strong&gt; — post-deploy RCA in production&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Same DNA across all three: &lt;strong&gt;reports data → explains what to do next.&lt;/strong&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  Try it
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Install:&lt;/strong&gt; &lt;code&gt;pip install perfsage-signalpilot&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Repo:&lt;/strong&gt; &lt;a href="https://github.com/perfsage/signalpilot" rel="noopener noreferrer"&gt;github.com/perfsage/signalpilot&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Release:&lt;/strong&gt; &lt;a href="https://github.com/perfsage/signalpilot/releases/tag/v1.0.0" rel="noopener noreferrer"&gt;v1.0.0&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Background:&lt;/strong&gt; &lt;a href="https://perfsage.com/blog/why-im-building-signalpilot-kubernetes-rca/" rel="noopener noreferrer"&gt;Field Notes #4 — why I built it&lt;/a&gt; · &lt;a href="https://perfsage.com/blog/introducing-perfsage-signalpilot-kubernetes-rca/" rel="noopener noreferrer"&gt;Field Notes #3 — quick start&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;War-room stories and feedback welcome on &lt;a href="https://github.com/perfsage/signalpilot/issues" rel="noopener noreferrer"&gt;GitHub Issues&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Field Notes #5 · By Aashish Bajpai&lt;/em&gt;&lt;/p&gt;




&lt;p&gt;Originally published at &lt;a href="https://perfsage.com/blog/5-minute-post-deploy-postmortem-signalpilot/" rel="noopener noreferrer"&gt;https://perfsage.com/blog/5-minute-post-deploy-postmortem-signalpilot/&lt;/a&gt;&lt;/p&gt;

</description>
      <category>kubernetes</category>
      <category>devops</category>
      <category>opensource</category>
      <category>sre</category>
    </item>
  </channel>
</rss>
