<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Mohammed</title>
    <description>The latest articles on DEV Community by Mohammed (@g7_eaf9b7f).</description>
    <link>https://dev.to/g7_eaf9b7f</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3879220%2Fe442a2ae-9f71-43a5-910f-c06522cb88cd.png</url>
      <title>DEV Community: Mohammed</title>
      <link>https://dev.to/g7_eaf9b7f</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/g7_eaf9b7f"/>
    <language>en</language>
    <item>
      <title>I built an open-source ops automation platform — here's what I learned</title>
      <dc:creator>Mohammed</dc:creator>
      <pubDate>Tue, 14 Apr 2026 19:43:20 +0000</pubDate>
      <link>https://dev.to/g7_eaf9b7f/i-built-an-open-source-ops-automation-platform-heres-what-i-learned-3ec6</link>
      <guid>https://dev.to/g7_eaf9b7f/i-built-an-open-source-ops-automation-platform-heres-what-i-learned-3ec6</guid>
      <description>&lt;p&gt;For the past few months I've been building &lt;a href="https://github.com/infralaneapp/infralane" rel="noopener noreferrer"&gt;Infralane&lt;/a&gt;, an open-source platform for DevOps and IT operations teams. Think of it as a service desk that actually understands infrastructure workflows — not just a form that creates a ticket and throws it into a queue.                                                                                                                          &lt;/p&gt;

&lt;p&gt;## Why I built it&lt;/p&gt;

&lt;p&gt;Every DevOps team I've worked on has the same problem: access requests come through Slack, deployments are tracked in spreadsheets, and incident response is a mix of PagerDuty alerts and "who's online?" messages. There are&lt;br&gt;
  enterprise tools for this but they're expensive and take forever to set up.                                                                                                                                                         &lt;/p&gt;

&lt;p&gt;I wanted something where:                                                                                                                                                                                                           &lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Requests come in with the right fields already defined (not "describe your issue")&lt;/li&gt;
&lt;li&gt;Automation rules handle the boring parts (assign, tag, notify, escalate)
&lt;/li&gt;
&lt;li&gt;Sensitive actions need approval before executing
&lt;/li&gt;
&lt;li&gt;Everything is auditable
&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;## What it looks like&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F4hthv1c6ir9kd6dthz9c.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F4hthv1c6ir9kd6dthz9c.png" alt="Dashboard" width="800" height="403"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;## How the automation engine works                        &lt;/p&gt;

&lt;p&gt;You create rules with a trigger, conditions, and an action.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Trigger&lt;/strong&gt; → when something happens&lt;br&gt;&lt;br&gt;
  &lt;strong&gt;Conditions&lt;/strong&gt; → match against ticket fields&lt;br&gt;&lt;br&gt;
  &lt;strong&gt;Action&lt;/strong&gt; → do something                   &lt;/p&gt;

&lt;p&gt;Some real examples:                                       &lt;/p&gt;

&lt;p&gt;Trigger:    Ticket created&lt;br&gt;
  Conditions: type = incident AND priority = urgent&lt;br&gt;&lt;br&gt;
  Action:     Assign to on-call operator&lt;/p&gt;

&lt;p&gt;Trigger:    SLA breached&lt;br&gt;&lt;br&gt;
  Conditions: type = incident AND priority = high&lt;br&gt;
  Action:     Escalate priority to urgent                                                                                                                                                                                             &lt;/p&gt;

&lt;p&gt;Trigger:    Ticket created&lt;br&gt;&lt;br&gt;
  Conditions: type = deployment AND environment = production&lt;br&gt;
  Action:     Require approval before proceeding&lt;/p&gt;

&lt;p&gt;There are 8 action types: assign, change status, change priority, add tag, notify, Slack message, webhook, and escalation chains.&lt;/p&gt;

&lt;p&gt;### Under the hood                                        &lt;/p&gt;

&lt;p&gt;The worker is a separate Node.js process that polls for queued jobs every 5 seconds. It uses &lt;code&gt;SELECT FOR UPDATE SKIP LOCKED&lt;/code&gt; in PostgreSQL for atomic job claiming — no Redis needed.                                               &lt;/p&gt;

&lt;p&gt;Jobs get exponential backoff on failure and move to a dead-letter queue after 3 attempts. Every state transition is logged so you can trace exactly what happened and why.                                                          &lt;/p&gt;

&lt;p&gt;## Approval workflows                                                                                                                                                                                                               &lt;/p&gt;

&lt;p&gt;This is the feature I think makes it more than just another ticketing tool.&lt;/p&gt;

&lt;p&gt;When a rule has "requires approval" enabled, the automation job pauses in a &lt;code&gt;PENDING_APPROVAL&lt;/code&gt; state. The ticket gets &lt;strong&gt;locked&lt;/strong&gt; — operators can't resolve or close it until someone approves or rejects.                           &lt;/p&gt;

&lt;p&gt;This means you can enforce rules like "any production deployment needs approval" at the system level. No one can skip the approval by just resolving the ticket directly.                                                           &lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fduuvggfmm59q295tzika.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fduuvggfmm59q295tzika.png" alt="Create Ticket" width="800" height="398"&gt;&lt;/a&gt;                                                                                                                            &lt;/p&gt;

&lt;p&gt;## Three-tier role system                                                                                                                                                                                                           &lt;/p&gt;

&lt;p&gt;Not every user should see everything:                                                                                                                                                                                               &lt;/p&gt;

&lt;p&gt;| Role | What they can do |&lt;br&gt;&lt;br&gt;
  |------|-----------------|&lt;br&gt;
  | &lt;strong&gt;Requester&lt;/strong&gt; | Submit tickets, view their own, comment, rate resolved tickets |&lt;br&gt;&lt;br&gt;
  | &lt;strong&gt;Operator&lt;/strong&gt; | Work all tickets, assign, change status, approve/reject, view reports |&lt;br&gt;&lt;br&gt;
  | &lt;strong&gt;Admin&lt;/strong&gt; | Everything above + manage settings, automation rules, team, integrations |                                                                                                                                            &lt;/p&gt;

&lt;p&gt;## What I learned building it                                                                                                                                                                                                       &lt;/p&gt;

&lt;p&gt;### Dedup is harder than it looks                         &lt;/p&gt;

&lt;p&gt;My first approach used a unique constraint on &lt;code&gt;(ruleId, ticketId, trigger)&lt;/code&gt;. That's too coarse — the same rule should fire multiple times on the same ticket for repeated status changes.                                           &lt;/p&gt;

&lt;p&gt;I ended up using SHA-256 hash keys derived from rule + ticket + trigger + context. The unique constraint on the hash handles race conditions — if two concurrent emissions both pass the check, the database rejects the duplicate. &lt;/p&gt;

&lt;p&gt;### Automation needs cascade prevention                                                                                                                                                                                             &lt;/p&gt;

&lt;p&gt;If an automation rule changes a ticket's status, and there's another rule that triggers on status changes... you get infinite loops.&lt;/p&gt;

&lt;p&gt;The fix: executors write directly to Prisma, not through the service functions that emit triggers. Automation actions never re-trigger other automation rules.                                                                      &lt;/p&gt;

&lt;p&gt;### Gates must block all paths                                                                                                                                                                                                      &lt;/p&gt;

&lt;p&gt;I built approval workflows that block automation execution. Then someone pointed out: what's stopping an operator from just resolving the ticket manually while the approval is pending?                                            &lt;/p&gt;

&lt;p&gt;Nothing was. So I added ticket locking — while an approval is pending, the ticket's status can't be changed through any path. The API returns &lt;code&gt;409 PENDING_APPROVAL&lt;/code&gt;. The UI shows a warning instead of the status dropdown.        &lt;/p&gt;

&lt;p&gt;### Dev fallbacks are production footguns                                                                                                                                                                                           &lt;/p&gt;

&lt;p&gt;I had a dev fallback for the session signing secret — if the env var was missing, it used a hardcoded default. One bad deployment config and every session is forgeable.                                                            &lt;/p&gt;

&lt;p&gt;Now it throws a fatal error in production if the secret is missing. Fail fast, not fail silently.                                                                                                                                   &lt;/p&gt;

&lt;p&gt;## Stack&lt;/p&gt;

&lt;p&gt;| Layer | Technology |&lt;br&gt;&lt;br&gt;
  |-------|-----------|&lt;br&gt;
  | Frontend | Next.js 15, React, TypeScript, Tailwind CSS |&lt;br&gt;&lt;br&gt;
  | Backend | Next.js API Routes, Prisma ORM |&lt;br&gt;&lt;br&gt;
  | Database | PostgreSQL 16 |&lt;br&gt;
  | Auth | HMAC-SHA256 session cookies + Slack OAuth |&lt;br&gt;&lt;br&gt;
  | Worker | Standalone Node.js process |&lt;br&gt;&lt;br&gt;
  | Real-time | Server-Sent Events (SSE) |                                                                                                                                                                                            &lt;/p&gt;

&lt;p&gt;## Self-host it                                           &lt;/p&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;
bash                                                   
  git clone https://github.com/infralaneapp/infralane.git
  cd infralane
  docker compose up -d

  App runs at http://localhost:3000. First user to register becomes admin.                                                                                                                                                            

  What's next                                                                                                                                                                                                                         

  The core is stable but there's plenty to improve. Some open issues:                                                                                                                                                                 

  - Test Slack integration against a real workspace                                                                                                                                                                                   
  - Implement actual SMTP email sending (currently a stub)  
  - Password reset flow                                                                                                                                                                                                               
  - Rate limiting on more endpoints                         

  MIT licensed. If you're running ops workflows and have opinions about what's missing, I'd genuinely like to hear them.                                                                                                              

  GitHub: github.com/infralaneapp/infralane            
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

</description>
      <category>opensource</category>
      <category>devops</category>
      <category>nextjs</category>
      <category>automation</category>
    </item>
  </channel>
</rss>
