DEV Community

Mohammed
Mohammed

Posted on • Edited on

I built an open-source ops automation platform — here's what I learned

For the past few months, I've been building Infralane, an open-source platform for DevOps and IT operations teams. Think of it as a service desk that actually understands infrastructure workflows — not just a form that creates a ticket and throws it into a queue.

Why I built it

Every DevOps team I've worked on has the same problem: access requests come through Slack, deployments are tracked in spreadsheets, and incident response is a mix of PagerDuty alerts and "who's online?" messages. There are enterprise tools for this, but they're expensive and take forever to set up.

I wanted something where:

  1. Requests come in with the right fields already defined (not "describe your issue").
  2. Automation rules handle the boring parts (assign, tag, notify, escalate).
  3. Sensitive actions need approval before executing.
  4. Everything is auditable.

What it looks like

Screenshot of the Infralane dashboard showing automation stats

How the automation engine works

You create rules with a trigger, conditions, and an action.

  • Trigger → when something happens
  • Conditions → match against ticket fields
  • Action → do something

Real-world examples:

Trigger: Ticket created
Conditions: type = incident AND priority = urgent
Action: Assign to on-call operator

Trigger: Ticket created
Conditions: type = deployment AND environment = production
Action: Require approval before proceeding

There are 8 action types: assign, change status, change priority, add tag, notify, Slack message, webhook, and escalation chains.

Under the hood

The worker is a separate Node.js process that polls for queued jobs every 5 seconds. It uses SELECT FOR UPDATE SKIP LOCKED in PostgreSQL for atomic job claiming — no Redis needed.

Jobs get exponential backoff on failure and move to a dead-letter queue after 3 attempts. Every state transition is logged so you can trace exactly what happened and why.

Approval workflows

This is the feature I think makes it more than just another ticketing tool.

When a rule has "requires approval" enabled, the automation job pauses in a PENDING_APPROVAL state. The ticket gets locked — operators can't resolve or close it until someone approves or rejects.

The Infralane ticket creation interface

Three-tier role system

Role What they can do
Requester Submit tickets, view their own, comment, rate resolved tickets
Operator Work all tickets, assign, change status, approve/reject, view reports
Admin Everything above + manage settings, automation rules, team, integrations

What I learned building it

1. Dedup is harder than it looks

My first approach used a unique constraint on (ruleId, ticketId, trigger). That's too coarse — the same rule should fire multiple times on the same ticket for repeated status changes.

I ended up using SHA-256 hash keys derived from rule + ticket + trigger + context. The unique constraint on the hash handles race conditions.

2. Automation needs cascade prevention

If an automation rule changes a ticket's status, and there's another rule that triggers on status changes... you get infinite loops.

The fix: Executors write directly to Prisma, not through the service functions that emit triggers. Automation actions never re-trigger other automation rules.

3. Gates must block all paths

I built approval workflows that block automation execution, but forgot that an operator could just resolve the ticket manually! I had to add ticket locking — while an approval is pending, the ticket's status is immutable via the API (409 PENDING_APPROVAL).

4. Dev fallbacks are production footguns

I had a dev fallback for the session signing secret. One bad deployment config and every session is forgeable. Now, the app throws a fatal error in production if the secret is missing. Fail fast.

The Stack

  • Frontend: Next.js 15, React, Tailwind CSS
  • Backend: Next.js API Routes, Prisma ORM
  • Database: PostgreSQL 16
  • Worker: Standalone Node.js process
  • Real-time: Server-Sent Events (SSE)

Self-host it

git clone https://github.com/infralaneapp/infralane.git
cd infralane
docker compose up -d
Enter fullscreen mode Exit fullscreen mode

App runs at http://localhost:3000. The first user to register becomes the admin.

What's next

The core is stable but there's plenty to improve:

[ ] Test Slack integration against a real workspace
[ ] Implement actual SMTP email sending
[ ] Password reset flow
[ ] Rate limiting
MIT licensed. If you're running ops workflows and have opinions about what's missing, I'd genuinely like to hear them!

Infralane

Structured ops. Automated execution.

CI License: MIT Deploy on Railway Live Demo

Infralane is an ops control center for DevOps and IT operations teams. Ticket creation triggers automation rules, approvals gate sensitive actions, and every state change is traceable.

Try it now: Live Demo — login with admin@infralane.com / 12345678

Dashboard

Key Features

  • Structured ticket intake — Typed requests (access, deployment, incident, infrastructure) with custom field schemas and templates
  • Automation engine — Rules that trigger on ticket events, evaluate conditions, and execute actions (assign, change status, notify, escalate, webhook)
  • Approval workflows — Gate automation behind human approval with designated approvers and ticket locking
  • Three-tier roles — Requester, Operator, Admin with granular permissions
  • SLA tracking — Configurable response/resolution thresholds with breach detection and auto-escalation
  • Slack integration — OAuth login, DM notifications, interactive approval buttons
  • Knowledge base — Self-service articles linked to ticket types
  • Full audit trail — Every mutation logged with automation job lifecycle events

Screenshots

Ticket Queue

Ticket Queue

Board

Top comments (0)