DEV Community

Cover image for I Built an AI-Powered CLI to Help Debug Production Incidents | Meet Incident Helper
Muhammad Yawar Malik
Muhammad Yawar Malik

Posted on

I Built an AI-Powered CLI to Help Debug Production Incidents | Meet Incident Helper

As an SRE and cloud engineer, I’ve been on the frontlines of production incidents more times than I care to count. Whether it's a 503 at 3 AM or a deployment rollback that took out half the stack, the mental overhead of figuring out where to start during an incident can be overwhelming.

alarms sre on call duty

So I built a tool to change that.

Meet Incident Helper

incident hepler cli tools opensource
Incident Helper is an AI-native command-line tool that helps developers, SREs, and DevOps engineers triage and troubleshoot incidents in real-time, right from the terminal.

It’s not just a wrapper around ChatGPT. It’s designed for actual production use, with structured prompts, OS-aware logic, and modular troubleshooting workflows. It keeps context as you walk through the issue and suggests concrete steps that make sense, no vague suggestions, no hand-wavy fluff.

Why I Built This
There’s no shortage of AI-powered copilots for writing code or summarizing docs. But when something breaks in production, we’re still stuck piecing together access logs, scanning dashboards, and hunting Stack Overflow.

I wanted to build a tool that feels like having an incident response teammate who knows your system, understands your OS, remembers your previous steps, and gives you smart next moves, all inside the terminal.

And of course, I wanted it to be open source, community-driven, and something that would genuinely help engineers when they're under pressure.

debug aws cloud alarm linux

What It Does
You start Incident Helper by running:

incident-helper start

It greets you, asks you what’s going on, and starts collecting context; your OS, the kind of error, whether you can SSH into the box, and so on. Based on your inputs, it begins suggesting:

  • Commands to check system state
  • Log file locations based on your OS
  • Diagnostic steps for common errors like 502s, 503s, 4xx series issues, etc
  • Follow-up questions that actually make sense
  • It remembers everything you said earlier, so you don’t have to repeat yourself every time.

Oh, and it supports local LLMs via Ollama, so if you don’t want to use OpenAI or pay for API calls, you’re totally good.

What Makes It Different
Incident Helper is:

  • Conversational: It uses AI to guide you like a human teammate would
  • OS-aware: Knows the difference between Ubuntu, CentOS, Amazon Linux, and even Windows (coming soon)
  • Extensible: Has modular resolvers that let you plug in support for HTTP issues, deployment failures, network glitches, etc
  • Context-sensitive: Tracks what you’ve already shared so follow-ups make sense
  • Open Source: Licensed under MIT, ready for contributions

This isn’t just another AI wrapper that parrots search results. It’s built for engineers in the trenches.

Under the Hood

  • Built with Python and Typer for a clean CLI experience
  • Uses Ollama to run local LLMs like Mistral with no cost or API usage
  • Modular architecture with pluggable “resolvers” and “OS adapters”
  • prompts.py builds structured instructions for the LLM
  • Designed for easy extension and community plugins

What’s Coming Next
Here’s what I plan to add soon:

  • Better diagnostic resolvers (for deploys, DB issues, etc)
  • Windows server support
  • More intelligent session memory
  • A plugin system so others can ship resolvers as pip packages
  • Real-world examples and demo logs

Looking for Collaborators

This is an early version, rough edges expected, no judgment. Come build together
team work site reliability engineer

I’m looking to grow this into a true OSS ecosystem. If you’re:

  • An SRE or DevOps engineer who wants smarter incident tooling
  • A Python developer who enjoys CLI tools
  • An AI tinkerer who loves building on top of LLMs
  • Someone who’s just tired of debugging production alone come help to build it.

👉 GitHub: https://github.com/malikyawar/incident-helper
👉 Drop a star, open an issue, or suggest a resolver

Final Thoughts
Incidents are stressful. They happen at the worst times. You shouldn’t have to choose between flipping through dashboards or playing “log detective” while your pager keeps going off.

Incident Helper is my attempt to bring AI where it actually matters, into the debugging loop. It’s just getting started, and I’d love to have you help shape it.

Let’s make incident response suck a little less.

Top comments (0)