This is a submission for the GitHub Copilot CLI Challenge
What I Built
OSCALFlow - a GitHub CLI extension for automating federal compliance docs (NIST 800-53, FedRAMP).
Repository: https://github.com/ivproduced/OSCALFlow
If you've ever had to document compliance controls manually, you know it's brutal. Security teams can spend 200+ hours per system just filling out paperwork. I built this to scan codebases and auto-detect which controls are already implemented.
The interesting part: I used gh copilot to build features that call gh copilot inside the tool. So the CLI extension I made with Copilot now uses Copilot to validate code and suggest implementations. Pretty meta.
What it does:
-
gh oscal scan- Detects 50+ controls from your code (150+ patterns across 8 languages) -
gh oscal scan --ai-validate- Shells out togh copilot explainto verify if implementations actually meet NIST requirements -
gh oscal suggest- Detects your stack and callsgh copilot suggestfor implementation guidance -
gh oscal generate- Creates OSCAL System Security Plan templates -
gh oscal export- Exports HTML reports
I tested it on a real federal system (FedChat) and got 19.8% auto-detection - 48 controls documented automatically, saving about 24 hours of work. Not perfect coverage but way better than starting from zero.
Demo
🎬 Video Walkthrough: https://youtu.be/3vqtV-HDFg4
Video: https://youtu.be/3vqtV-HDFg4
Install it:
gh extension install ivproduced/oscalflow
Basic scan example:
$ gh oscal scan Test_Case/FedChat
Found 58 signals → 48 controls (19.8% coverage)
Time saved: ~24 hours
✓ AC-2 (Account Management)
✓ AU-3 (Audit Record Content)
✓ SC-5 (Denial of Service Protection)
... (48 total)
Where it gets interesting - AI validation:
$ gh oscal scan . --ai-validate --ai-limit 5
AI validating with gh copilot...
✅ AU-3 [VERIFIED]
Found: Audit middleware
AI says: "Logs include user_id, timestamp, action, IP.
Meets AU-3 requirements."
✅ SC-5 [VERIFIED]
Found: Rate limiting
AI says: "express-rate-limit prevents resource exhaustion.
Complies with SC-5."
❌ SC-2 [FAILED]
Found: Multi-tenancy
AI says: "Shared database lacks logical separation.
SC-2 requires dedicated resources per tenant."
Pattern matching finds "might be there." AI validation confirms "actually works."
Get implementation help:
$ gh oscal suggest AC-2 backend/ --output guide.md
Detected: Python, FastAPI, SQLAlchemy
Asking gh copilot for AC-2 implementation...
Saved 8 steps to guide.md:
- Database schema changes
- FastAPI endpoints
- SQLAlchemy queries
- Audit logging
Full workflow:
# Start with a template
gh oscal generate --baseline moderate --system "MyApp" -o ssp.json
# Scan your code
gh oscal scan . --ai-validate --output results.json
# Get help implementing gaps
gh oscal suggest AC-2 . --output guide.md
# Export report
gh oscal export ssp.json -o
## My Experience with GitHub Copilot CLI
**The meta-moment:** I used `gh copilot` terminal sessions to build a tool that calls `gh copilot` as part of its features. Recursive productivity at its finest.
Here's the weird part: I used `gh copilot` to build a tool that calls `gh copilot`.
Three features I built with Copilot CLI that now use Copilot CLI:
**1. OSCAL Catalog Parser** (~200 lines)
Needed to parse the 10MB NIST 800-53 JSON catalog. Ran this:
bash
$ gh copilot suggest -t shell
"Parse 10MB NIST OSCAL catalog JSON, extract control IDs,
titles and descriptions into searchable format"
Got working code in one session. Just worked.
**2. AI Validator**
This one shells out to `gh copilot explain` to verify implementations:
bash
$ gh copilot suggest -t shell
"Create Node function that calls gh copilot explain with code
context and NIST requirement, parse response"
It gave me the `execSync` pattern, error handling, output parsing - built it in an hour.
**3. AI Suggester**
Detects your stack (Python/Node/Docker) and calls `gh copilot suggest`:
bash
$ gh copilot suggest -t shell
"Detect tech stack from files, build prompt for gh copilot
suggest, execute and capture output"
Got the whole tech detection + CLI orchestration in one go.
What changed for me:
Before: Alt-tab to browser, search NIST docs, read generic StackOverflow, try to adapt to my stack. 2-3 hours per control.
After: Stay in terminal, `gh copilot suggest` with my exact question, get code that works with my stack. 20-30 minutes per control.
Used `gh copilot` 10+ times while building this. Never left the terminal. No iteration needed - first suggestions were production-ready.
The meta part is that I built a compliance tool using Copilot CLI, and now that tool uses Copilot CLI to teach compliance. It's recursive but actually makes sense - why wouldn't a CLI extension leverage other CLI extensions?
Top comments (0)