This is a submission for the GitHub Finish-Up-A-Thon Challenge
(https://dev.to/challenges/github-2026-05-21)
What I Built
I built PagerPilot, an AI-powered SRE incident investigator that helps on-call engineers move from noisy alerts to an actionable incident brief.
PagerPilot uses a multi-agent workflow:
- Router Agent — identifies the affected service and decides which sources to check.
- Source Fetch Agent — gathers signals from PagerDuty, Datadog, GitHub, and StatusGator.
- Analyst Agent — correlates incidents, metrics, alerts, deploys, commits, and third-party status signals.
- Reporter Agent — generates a structured incident brief with root-cause hypotheses, confidence scores, timeline, blast radius, and recommended actions.
The goal is simple: reduce the time engineers spend jumping between dashboards during an incident and give them one trusted starting point for triage.
Repository: https://github.com/Vedant817/PagerPilot
You can run the project locally:
git clone https://github.com/Vedant817/PagerPilot.git
cd PagerPilot
pip install -r requirements.txt
python -m backend
Then open:
http://localhost:8000/ui
Useful endpoints:
GET /health
GET /api/v1/diagnostics
GET /api/v1/incidents
POST /api/v1/investigate
** Example investigation payload:**
{
"incident_id": "INC-001"
}
**
PagerPilot returns a full incident brief, including:
**
- executive summary
- evidence timeline
- ranked root-cause hypotheses
- confidence score
- recommended remediation steps
- service impact and blast radius
- session logs from the agent pipeline
The Comeback Story
PagerPilot started as an unfinished incident-investigation prototype. The idea was strong, but the project needed extensive finishing work before it felt reliable.
Before the final push, the app had several issues:
- missing runtime dependencies
- fragile environment parsing
- incomplete connector behaviour
- weak error handling
- frontend rendering bugs
- unsafe dynamic HTML rendering
- inconsistent investigation responses
- evaluation output that failed on some terminals
- runtime configuration that was not production-friendly
I brought the project into a finished state by:
- hardening the FastAPI backend
- adding diagnostics for connector readiness
- improving session and failure responses
- making the analyst's logic more reliable
- improving timestamp handling and timeline sorting
- strengthening secret redaction
- fixing frontend failure states
- reducing unsafe DOM rendering patterns
- making the evaluation runner portable
- cleaning up config, Docker ignore rules, and runtime startup
- committing changes in separate logical commits instead of one large dump
The final result is a working AI SRE investigator that can run locally, connect to real tools, and serve as a Practical Incident Briefing Assistant.
My Experience with GitHub Copilot
GitHub Copilot helped speed up the finish line by acting as a pair programmer while I cleaned up the project.
It was especially useful for:
- reasoning through backend failure cases
- improving FastAPI response models
- refactoring repeated config parsing logic
- tightening connector error handling
- improving frontend rendering safety
- writing cleaner validation and smoke-test flows
- organising fixes into focused commits
The biggest value was momentum. Instead of getting stuck on small implementation details, I could keep moving through the backlog: diagnose, patch, test, commit, and repeat.
PagerPilot feels like the kind of project that benefits from AI assistance twice: first as a coding partner during development, and second as the actual product experience for engineers responding to incidents.
Top comments (0)