Agastya Khati

Posted on Feb 15

Copilot Sherlock : Governed Incident Investigation CLI

#devchallenge #githubchallenge #cli #githubcopilot

GitHub Copilot CLI Challenge Submission

This is a submission for the GitHub Copilot CLI Challenge

Copilot Sherlock — Governed Incident Investigation CLI

AI proposes. Humans decide. The system verifies.

What I Built

I built Sherlock, a CLI-based incident investigation system that demonstrates how to use GitHub Copilot CLI inside a real production-style workflow, not just as a code generator or chat assistant.

Sherlock turns raw production evidence (logs, metrics, deployments) into a governed, auditable incident decision with a strict lifecycle:

AI performs bounded reasoning
Humans explicitly decide (ACCEPT / MODIFY / REJECT)
The system enforces authority and lifecycle rules
Outcomes are recorded as append-only organizational memory
All artifacts are cryptographically verifiable

AI never decides. Humans always decide.

This is not an AI demo.
It’s an incident lifecycle system with AI inside it.

The 90-Second Pitch

Most incident tools do one of two things:

Give you AI suggestions with no governance
Give you checklists with no intelligence

Sherlock separates intelligence from authority:

AI proposes multiple competing hypotheses with evidence and confidence
Humans approve or reject the outcome
The system enforces lifecycle gates, memory isolation, and execution rules
Every decision is immutable and externally verifiable

There are no feedback loops, no auto-approval, and no adaptive behavior.

This is what production-grade AI assistance looks like.

Demo

Video Walkthrough (Full End-to-End Run)

GitHub Repository

https://github.com/kris70lesgo/sherlock-demo

One-Command Demo

./run-demo.sh

This executes the entire incident lifecycle:

Evidence normalization and trust validation
Scope reduction (thousands of log lines → a few signals)
AI hypothesis reasoning using GitHub Copilot CLI
Mandatory human governance (ACCEPT / MODIFY / REJECT)
Organizational memory write (append only, read only)
Operational integration (Slack / JIRA simulation)
Cryptographic provenance and trust report generation

Fresh Incident IDs

To avoid append-only memory conflicts, you can run:

./sherlock investigate INC-999

Copilot Auth vs Offline Mode

Sherlock uses GitHub Copilot CLI for Phase 3 reasoning when authenticated.

If Copilot is not authenticated, Sherlock falls back to an offline post-mortem generator so the demo still completes.

To authenticate Copilot:

gh auth login

My Experience with GitHub Copilot CLI

Copilot CLI was used as a reasoning engine, not as an authority.

Specifically:

Copilot generates multi-hypothesis RCAs from scoped evidence
Each hypothesis includes:
- Evidence FOR
- Evidence AGAINST
- Confidence contribution
Confidence is budgeted and uncertainty is explicit
Hypotheses can be ruled out with justification

Crucially:

Copilot cannot finalize anything
It cannot approve itself
It cannot modify governance rules
Its output is preserved even when rejected

The most valuable aspect of Copilot CLI here wasn’t speed it was structured reasoning under constraints.
Once those constraints were clear, Copilot became a reliable component in a larger system instead of a black box.

Why This Matters

Sherlock demonstrates:

How to use Copilot CLI inside a production workflow
How to prevent AI feedback loops
How to enforce human authority mechanically
How to make AI output externally verifiable
How to integrate AI without letting it run the system

Most tools ask you to trust them.

Sherlock gives you a way to verify them.

That’s the difference between a demo and a product.

Final Notes

Language: Bash + Python
Interface: CLI (interactive and non-interactive modes)
Architecture: 7 phase, strictly isolated pipeline
Status: Frozen, validated, demo ready

Thanks for checking it out.

DEV Community