PagePilot

Vedant Mahajan — Sat, 06 Jun 2026 11:45:28 +0000

This is a submission for the GitHub Finish-Up-A-Thon Challenge
(https://dev.to/challenges/github-2026-05-21)

What I Built

I built PagerPilot, an AI-powered SRE incident investigator that helps on-call engineers move from noisy alerts to an actionable incident brief.

PagerPilot uses a multi-agent workflow:

Router Agent — identifies the affected service and decides which sources to check.
Source Fetch Agent — gathers signals from PagerDuty, Datadog, GitHub, and StatusGator.
Analyst Agent — correlates incidents, metrics, alerts, deploys, commits, and third-party status signals.
Reporter Agent — generates a structured incident brief with root-cause hypotheses, confidence scores, timeline, blast radius, and recommended actions.

The goal is simple: reduce the time engineers spend jumping between dashboards during an incident and give them one trusted starting point for triage.

Repository: https://github.com/Vedant817/PagerPilot

You can run the project locally:

   git clone https://github.com/Vedant817/PagerPilot.git
   cd PagerPilot
   pip install -r requirements.txt
   python -m backend

Then open:

   http://localhost:8000/ui

Useful endpoints:

   GET  /health
   GET  /api/v1/diagnostics
   GET  /api/v1/incidents
   POST /api/v1/investigate

** Example investigation payload:**

   {
     "incident_id": "INC-001"
   }

PagerPilot returns a full incident brief, including:

executive summary
evidence timeline
ranked root-cause hypotheses
confidence score
recommended remediation steps
service impact and blast radius
session logs from the agent pipeline

The Comeback Story

PagerPilot started as an unfinished incident-investigation prototype. The idea was strong, but the project needed extensive finishing work before it felt reliable.

Before the final push, the app had several issues:

missing runtime dependencies
fragile environment parsing
incomplete connector behaviour
weak error handling
frontend rendering bugs
unsafe dynamic HTML rendering
inconsistent investigation responses
evaluation output that failed on some terminals
runtime configuration that was not production-friendly

I brought the project into a finished state by:

hardening the FastAPI backend
adding diagnostics for connector readiness
improving session and failure responses
making the analyst's logic more reliable
improving timestamp handling and timeline sorting
strengthening secret redaction
fixing frontend failure states
reducing unsafe DOM rendering patterns
making the evaluation runner portable
cleaning up config, Docker ignore rules, and runtime startup
committing changes in separate logical commits instead of one large dump

The final result is a working AI SRE investigator that can run locally, connect to real tools, and serve as a Practical Incident Briefing Assistant.

My Experience with GitHub Copilot

GitHub Copilot helped speed up the finish line by acting as a pair programmer while I cleaned up the project.

It was especially useful for:

reasoning through backend failure cases
improving FastAPI response models
refactoring repeated config parsing logic
tightening connector error handling
improving frontend rendering safety
writing cleaner validation and smoke-test flows
organising fixes into focused commits

The biggest value was momentum. Instead of getting stuck on small implementation details, I could keep moving through the backlog: diagnose, patch, test, commit, and repeat.

PagerPilot feels like the kind of project that benefits from AI assistance twice: first as a coding partner during development, and second as the actual product experience for engineers responding to incidents.

DEV Community: Vedant Mahajan

PagePilot

What I Built

PagerPilot uses a multi-agent workflow:

PagerPilot returns a full incident brief, including: