<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Louis Fradin</title>
    <description>The latest articles on DEV Community by Louis Fradin (@louisatanyshift).</description>
    <link>https://dev.to/louisatanyshift</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3976184%2F525ad07c-6ffa-442b-ad46-60ef701ba7bf.jpg</url>
      <title>DEV Community: Louis Fradin</title>
      <link>https://dev.to/louisatanyshift</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/louisatanyshift"/>
    <language>en</language>
    <item>
      <title>We open-sourced the SRE judgment that doesn't fit in a system prompt</title>
      <dc:creator>Louis Fradin</dc:creator>
      <pubDate>Tue, 09 Jun 2026 14:31:54 +0000</pubDate>
      <link>https://dev.to/louisatanyshift/open-source-sre-methodology-skills-an-ai-agent-can-load-apache-20-runnable-offline-against-3olc</link>
      <guid>https://dev.to/louisatanyshift/open-source-sre-methodology-skills-an-ai-agent-can-load-apache-20-runnable-offline-against-3olc</guid>
      <description>&lt;p&gt;&lt;strong&gt;TL;DR:&lt;/strong&gt; &lt;code&gt;sre-skills&lt;/code&gt; is an open-source (Apache-2.0) library of SRE methodology skills an AI agent can load: the decision procedure for working an incident, and not just the commands. It runs offline against fixtures checked into the repo with no credentials, so you can read how an agent reasons before trusting it on prod. One of five skills is shipped so far.&lt;/p&gt;

&lt;p&gt;An AI agent in the incident channel can already grep the logs and run &lt;code&gt;kubectl get pods&lt;/code&gt;. It can read the error rate straight off a Grafana panel. Where it falls down is the judgment part: deciding whether the deploy from twenty minutes ago is the actual cause or a coincidence, and knowing when the dig has gone far enough to wake someone up.&lt;/p&gt;

&lt;p&gt;That judgment is the actual job, and it doesn't fit in a system prompt.&lt;/p&gt;

&lt;p&gt;So we started writing it down as skills an agent can load. &lt;code&gt;sre-skills&lt;/code&gt; is an open library of &lt;strong&gt;methodology-shaped SRE skills&lt;/strong&gt;, Apache-2.0 and vendor-neutral. Five of them:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;investigate a live incident&lt;/li&gt;
&lt;li&gt;weigh change impact before a risky apply&lt;/li&gt;
&lt;li&gt;hand over on-call&lt;/li&gt;
&lt;li&gt;write a postmortem&lt;/li&gt;
&lt;li&gt;audit production readiness&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;You don't need a vendor account or our product to run any of them, and each one runs end to end against fixtures checked into the repo.&lt;/p&gt;

&lt;p&gt;Here's the honest part: &lt;strong&gt;one of the five is actually finished.&lt;/strong&gt; The other four are scoped, with the structure in place and the methodology still to fill in. The one that's done is &lt;code&gt;incident-investigator&lt;/code&gt;, and we wrote &lt;strong&gt;eleven worked incident scenarios&lt;/strong&gt; for it before we trusted it to ship as the reference template the others copy.&lt;/p&gt;

&lt;p&gt;Methodology-shaped means the skill carries the order of operations, not just the commands. The retry-storm fixture is a good tell. &lt;code&gt;payments-api&lt;/code&gt; is throwing errors, and the most recent deploy is an invoice-formatting refactor whose own commit message swears &lt;code&gt;no behavior change to ledger client&lt;/code&gt;. A naive agent goes straight to the latency graph. The skill goes to the deploy history first and holds that innocent-looking refactor as the prime suspect until something clears it, since a change that landed right before the errors earns suspicion a flat graph never will. The same logic runs the exits: which failure modes would let the deploy off the hook, and when to quit guessing and page a human. That ordering is most of the work.&lt;/p&gt;

&lt;h2&gt;
  
  
  Run it without touching production
&lt;/h2&gt;

&lt;p&gt;You don't need a production system to see it work. The repo ships a snapshot of one failing service, the deploys and logs and metrics it had during a real incident, and &lt;code&gt;incident-investigator&lt;/code&gt; works that snapshot the way it would a live page. No setup, no credentials, so you can read its whole chain of reasoning before you trust it anywhere near your own stack. When it's useful, fork the template and drop your own runbooks in where ours are.&lt;/p&gt;

&lt;h2&gt;
  
  
  Where to get it
&lt;/h2&gt;

&lt;p&gt;It installs as a Claude Code plugin:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;/plugin marketplace add anyshift-io/claude-plugins
/plugin &lt;span class="nb"&gt;install &lt;/span&gt;sre-skills@anyshift
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;For any other agent, it's a plain &lt;code&gt;git clone&lt;/code&gt; of &lt;a href="https://github.com/anyshift-io/sre-skills" rel="noopener noreferrer"&gt;&lt;code&gt;anyshift-io/sre-skills&lt;/code&gt;&lt;/a&gt;. The skills are Markdown with fixtures next to them, so anything that can read files can load them.&lt;/p&gt;

&lt;p&gt;The hard part of the four remaining skills isn't the plumbing. It's pinning down the judgment a good on-call makes without thinking, in enough detail that an agent can follow it and you can audit where it went wrong. The repo is open for forks and scenarios. If you've got an incident shape the investigator should handle and doesn't, that's the contribution we want!&lt;/p&gt;

&lt;p&gt;More on what we're building at Anyshift: &lt;a href="https://anyshift.io?utm_source=devto&amp;amp;utm_medium=social&amp;amp;utm_campaign=sre-skills-launch" rel="noopener noreferrer"&gt;anyshift.io&lt;/a&gt;.&lt;/p&gt;

</description>
      <category>sre</category>
      <category>devops</category>
      <category>ai</category>
      <category>opensource</category>
    </item>
  </channel>
</rss>
