Skip to content

DEV Community

Mohammed

Posted on Apr 14 • Edited on Apr 16

I built an open-source ops automation platform — here's what I learned

#opensource #devops #nextjs #automation

For the past few months, I've been building Infralane, an open-source platform for DevOps and IT operations teams. Think of it as a service desk that actually understands infrastructure workflows — not just a form that creates a ticket and throws it into a queue.

Why I built it

Every DevOps team I've worked on has the same problem: access requests come through Slack, deployments are tracked in spreadsheets, and incident response is a mix of PagerDuty alerts and "who's online?" messages. There are enterprise tools for this, but they're expensive and take forever to set up.

I wanted something where:

Requests come in with the right fields already defined (not "describe your issue").
Automation rules handle the boring parts (assign, tag, notify, escalate).
Sensitive actions need approval before executing.
Everything is auditable.

What it looks like

How the automation engine works

You create rules with a trigger, conditions, and an action.

Trigger → when something happens
Conditions → match against ticket fields
Action → do something

Real-world examples:

Trigger: Ticket created
Conditions: type = incident AND priority = urgent
Action: Assign to on-call operator

Trigger: Ticket created
Conditions: type = deployment AND environment = production
Action: Require approval before proceeding

There are 8 action types: assign, change status, change priority, add tag, notify, Slack message, webhook, and escalation chains.

Under the hood

The worker is a separate Node.js process that polls for queued jobs every 5 seconds. It uses SELECT FOR UPDATE SKIP LOCKED in PostgreSQL for atomic job claiming — no Redis needed.

Jobs get exponential backoff on failure and move to a dead-letter queue after 3 attempts. Every state transition is logged so you can trace exactly what happened and why.

Approval workflows

This is the feature I think makes it more than just another ticketing tool.

When a rule has "requires approval" enabled, the automation job pauses in a PENDING_APPROVAL state. The ticket gets locked — operators can't resolve or close it until someone approves or rejects.

Three-tier role system

Role	What they can do
Requester	Submit tickets, view their own, comment, rate resolved tickets
Operator	Work all tickets, assign, change status, approve/reject, view reports
Admin	Everything above + manage settings, automation rules, team, integrations

What I learned building it

1. Dedup is harder than it looks

My first approach used a unique constraint on (ruleId, ticketId, trigger). That's too coarse — the same rule should fire multiple times on the same ticket for repeated status changes.

I ended up using SHA-256 hash keys derived from rule + ticket + trigger + context. The unique constraint on the hash handles race conditions.

2. Automation needs cascade prevention

If an automation rule changes a ticket's status, and there's another rule that triggers on status changes... you get infinite loops.

The fix: Executors write directly to Prisma, not through the service functions that emit triggers. Automation actions never re-trigger other automation rules.

3. Gates must block all paths

I built approval workflows that block automation execution, but forgot that an operator could just resolve the ticket manually! I had to add ticket locking — while an approval is pending, the ticket's status is immutable via the API (409 PENDING_APPROVAL).

4. Dev fallbacks are production footguns

I had a dev fallback for the session signing secret. One bad deployment config and every session is forgeable. Now, the app throws a fatal error in production if the secret is missing. Fail fast.

The Stack

Frontend: Next.js 15, React, Tailwind CSS
Backend: Next.js API Routes, Prisma ORM
Database: PostgreSQL 16
Worker: Standalone Node.js process
Real-time: Server-Sent Events (SSE)

Self-host it

git clone https://github.com/infralaneapp/infralane.git
cd infralane
docker compose up -d

App runs at http://localhost:3000. The first user to register becomes the admin.

What's next

The core is stable but there's plenty to improve:

[ ] Test Slack integration against a real workspace
[ ] Implement actual SMTP email sending
[ ] Password reset flow
[ ] Rate limiting
MIT licensed. If you're running ops workflows and have opinions about what's missing, I'd genuinely like to hear them!

infralaneapp / infralane

Infralane

Structured ops. Automated execution.

Infralane is an ops control center for DevOps and IT operations teams. Ticket creation triggers automation rules, approvals gate sensitive actions, and every state change is traceable.

Try it now: Live Demo — login with admin@infralane.com / 12345678

Key Features

Structured ticket intake — Typed requests (access, deployment, incident, infrastructure) with custom field schemas and templates
Automation engine — Rules that trigger on ticket events, evaluate conditions, and execute actions (assign, change status, notify, escalate, webhook)
Approval workflows — Gate automation behind human approval with designated approvers and ticket locking
Three-tier roles — Requester, Operator, Admin with granular permissions
SLA tracking — Configurable response/resolution thresholds with breach detection and auto-escalation
Slack integration — OAuth login, DM notifications, interactive approval buttons
Knowledge base — Self-service articles linked to ticket types
Full audit trail — Every mutation logged with automation job lifecycle events

Screenshots

Ticket Queue

Board

…

Top comments (0)

Subscribe