For the past few months I've been building Infralane, an open-source platform for DevOps and IT operations teams. Think of it as a service desk that actually understands infrastructure workflows — not just a form that creates a ticket and throws it into a queue.
## Why I built it
Every DevOps team I've worked on has the same problem: access requests come through Slack, deployments are tracked in spreadsheets, and incident response is a mix of PagerDuty alerts and "who's online?" messages. There are
enterprise tools for this but they're expensive and take forever to set up.
I wanted something where:
- Requests come in with the right fields already defined (not "describe your issue")
- Automation rules handle the boring parts (assign, tag, notify, escalate)
- Sensitive actions need approval before executing
- Everything is auditable
## What it looks like
## How the automation engine works
You create rules with a trigger, conditions, and an action.
Trigger → when something happens
Conditions → match against ticket fields
Action → do something
Some real examples:
Trigger: Ticket created
Conditions: type = incident AND priority = urgent
Action: Assign to on-call operator
Trigger: SLA breached
Conditions: type = incident AND priority = high
Action: Escalate priority to urgent
Trigger: Ticket created
Conditions: type = deployment AND environment = production
Action: Require approval before proceeding
There are 8 action types: assign, change status, change priority, add tag, notify, Slack message, webhook, and escalation chains.
### Under the hood
The worker is a separate Node.js process that polls for queued jobs every 5 seconds. It uses SELECT FOR UPDATE SKIP LOCKED in PostgreSQL for atomic job claiming — no Redis needed.
Jobs get exponential backoff on failure and move to a dead-letter queue after 3 attempts. Every state transition is logged so you can trace exactly what happened and why.
## Approval workflows
This is the feature I think makes it more than just another ticketing tool.
When a rule has "requires approval" enabled, the automation job pauses in a PENDING_APPROVAL state. The ticket gets locked — operators can't resolve or close it until someone approves or rejects.
This means you can enforce rules like "any production deployment needs approval" at the system level. No one can skip the approval by just resolving the ticket directly.
## Three-tier role system
Not every user should see everything:
| Role | What they can do |
|------|-----------------|
| Requester | Submit tickets, view their own, comment, rate resolved tickets |
| Operator | Work all tickets, assign, change status, approve/reject, view reports |
| Admin | Everything above + manage settings, automation rules, team, integrations |
## What I learned building it
### Dedup is harder than it looks
My first approach used a unique constraint on (ruleId, ticketId, trigger). That's too coarse — the same rule should fire multiple times on the same ticket for repeated status changes.
I ended up using SHA-256 hash keys derived from rule + ticket + trigger + context. The unique constraint on the hash handles race conditions — if two concurrent emissions both pass the check, the database rejects the duplicate.
### Automation needs cascade prevention
If an automation rule changes a ticket's status, and there's another rule that triggers on status changes... you get infinite loops.
The fix: executors write directly to Prisma, not through the service functions that emit triggers. Automation actions never re-trigger other automation rules.
### Gates must block all paths
I built approval workflows that block automation execution. Then someone pointed out: what's stopping an operator from just resolving the ticket manually while the approval is pending?
Nothing was. So I added ticket locking — while an approval is pending, the ticket's status can't be changed through any path. The API returns 409 PENDING_APPROVAL. The UI shows a warning instead of the status dropdown.
### Dev fallbacks are production footguns
I had a dev fallback for the session signing secret — if the env var was missing, it used a hardcoded default. One bad deployment config and every session is forgeable.
Now it throws a fatal error in production if the secret is missing. Fail fast, not fail silently.
## Stack
| Layer | Technology |
|-------|-----------|
| Frontend | Next.js 15, React, TypeScript, Tailwind CSS |
| Backend | Next.js API Routes, Prisma ORM |
| Database | PostgreSQL 16 |
| Auth | HMAC-SHA256 session cookies + Slack OAuth |
| Worker | Standalone Node.js process |
| Real-time | Server-Sent Events (SSE) |
## Self-host it
bash
git clone https://github.com/infralaneapp/infralane.git
cd infralane
docker compose up -d
App runs at http://localhost:3000. First user to register becomes admin.
What's next
The core is stable but there's plenty to improve. Some open issues:
- Test Slack integration against a real workspace
- Implement actual SMTP email sending (currently a stub)
- Password reset flow
- Rate limiting on more endpoints
MIT licensed. If you're running ops workflows and have opinions about what's missing, I'd genuinely like to hear them.
GitHub: github.com/infralaneapp/infralane


Top comments (0)