Giga Kovaliovi

Posted on Mar 27

I built an AI tool for incident investigation (looking for honest feedback)

#devops #ai #sre #monitoring

Hey everyone 👋

Over the past couple of weeks, I’ve been building a side project called Opsrift.

It started from a pretty simple frustration:postmortems, handovers, and incident documentation take way too much time — and most of it is repetitive.

But while building it, I realized something more interesting:

The real problem isn’t writing postmortems.It’s understanding what actually happened during an incident.

So I ended up going a bit further than just a generator.

What Opsrift does right now

The platform is focused on incident workflows — mostly for people working in SRE, support, or operations.

Right now it includes:

Postmortem generator

Takes incident data and generates structured postmortems in seconds.

Handover generator

Useful for shift-based teams — turns messy updates into clean handovers.

Runbook generator

Creates structured runbooks based on incident patterns or inputs.

Incident Investigator (main focus)

This is the part I’m most interested in:

Pulls data from tools like Jira, PagerDuty, and Opsgenie

Correlates it with deployments from GitHub

Tries to reconstruct what actually happened (timeline, possible causes, etc.)

The goal is to reduce the time spent jumping between tools during investigations.

Status page

Basic external communication for incidents.

Integrations

Current integrations:

Jira

PagerDuty

Opsgenie

GitHub

Slack

Confluence

Still early — some of these are rough.

What it’s NOT (yet)

I want to be upfront:

It’s not a replacement for your incident management tools

It’s not perfect at root cause analysis

It’s not “production-grade” in every edge case

Right now it’s closer to:

an AI layer on top of your existing tools to speed up investigation and documentation

Known issues

To save you time:

GitHub login ❌ (bugged right now)

Slack login ❌ (also bugged)

👉 You can still use:

Google login

Email/password signup

Fixing these next.

What I’m trying to figure out

This is where I’d really appreciate help.

I’m trying to validate a few things:

Does the Incident Investigator actually help or is it just “nice to have”?

Are the outputs accurate enough to be trusted?

Would you use something like this in real workflows?

What’s missing for it to be genuinely useful?

Where I want to take this

Longer term, I’m thinking about moving beyond just generating outputs and more into:

detecting patterns across incidents

identifying unstable services

highlighting teams with high escalation rates

correlating deployments with incidents automatically

Basically:

turning incident data into something you can actually act on

If you want to try it

👉 https://opsrift.com

No pressure — even quick feedback is super helpful.

Final note

I’ve worked in NOC/SOC and incident-heavy environments, so this is very much a “scratch your own itch” project.

That said, I’m aware tools like this can easily become:

too generic

inaccurate

or just another dashboard nobody uses

So I’d rather get honest feedback early.

Even if it’s:

“this doesn’t solve anything for me”

That’s useful.

Thanks in advance 🙌

DEV Community

I built an AI tool for incident investigation (looking for honest feedback)

Top comments (0)