DEV Community

Cover image for OSCALFlow: Automate NIST 800-53 Compliance Documentation from Your Codebase
Tevin Harris
Tevin Harris

Posted on

OSCALFlow: Automate NIST 800-53 Compliance Documentation from Your Codebase

GitHub Copilot CLI Challenge Submission

This is a submission for the GitHub Copilot CLI Challenge

What I Built

OSCALFlow - a GitHub CLI extension for automating federal compliance docs (NIST 800-53, FedRAMP).

Repository: https://github.com/ivproduced/OSCALFlow

If you've ever had to document compliance controls manually, you know it's brutal. Security teams can spend 200+ hours per system just filling out paperwork. I built this to scan codebases and auto-detect which controls are already implemented.

The interesting part: I used gh copilot to build features that call gh copilot inside the tool. So the CLI extension I made with Copilot now uses Copilot to validate code and suggest implementations. Pretty meta.

What it does:

  • gh oscal scan - Detects 50+ controls from your code (150+ patterns across 8 languages)
  • gh oscal scan --ai-validate - Shells out to gh copilot explain to verify if implementations actually meet NIST requirements
  • gh oscal suggest - Detects your stack and calls gh copilot suggest for implementation guidance
  • gh oscal generate - Creates OSCAL System Security Plan templates
  • gh oscal export - Exports HTML reports

I tested it on a real federal system (FedChat) and got 19.8% auto-detection - 48 controls documented automatically, saving about 24 hours of work. Not perfect coverage but way better than starting from zero.

Demo

🎬 Video Walkthrough: https://youtu.be/3vqtV-HDFg4

Video: https://youtu.be/3vqtV-HDFg4

Install it:

gh extension install ivproduced/oscalflow
Enter fullscreen mode Exit fullscreen mode

Basic scan example:

$ gh oscal scan Test_Case/FedChat

Found 58 signals → 48 controls (19.8% coverage)
Time saved: ~24 hours

✓ AC-2 (Account Management)
✓ AU-3 (Audit Record Content)
✓ SC-5 (Denial of Service Protection)
... (48 total)
Enter fullscreen mode Exit fullscreen mode

Where it gets interesting - AI validation:

$ gh oscal scan . --ai-validate --ai-limit 5

AI validating with gh copilot...

✅ AU-3 [VERIFIED]
   Found: Audit middleware
   AI says: "Logs include user_id, timestamp, action, IP. 
            Meets AU-3 requirements."

✅ SC-5 [VERIFIED]
   Found: Rate limiting
   AI says: "express-rate-limit prevents resource exhaustion. 
            Complies with SC-5."

❌ SC-2 [FAILED]
   Found: Multi-tenancy
   AI says: "Shared database lacks logical separation. 
            SC-2 requires dedicated resources per tenant."
Enter fullscreen mode Exit fullscreen mode

Pattern matching finds "might be there." AI validation confirms "actually works."

Get implementation help:

$ gh oscal suggest AC-2 backend/ --output guide.md

Detected: Python, FastAPI, SQLAlchemy
Asking gh copilot for AC-2 implementation...

Saved 8 steps to guide.md:
  - Database schema changes
  - FastAPI endpoints
  - SQLAlchemy queries
  - Audit logging
Enter fullscreen mode Exit fullscreen mode

Full workflow:

# Start with a template
gh oscal generate --baseline moderate --system "MyApp" -o ssp.json

# Scan your code
gh oscal scan . --ai-validate --output results.json

# Get help implementing gaps
gh oscal suggest AC-2 . --output guide.md

# Export report
gh oscal export ssp.json -o 
## My Experience with GitHub Copilot CLI

**The meta-moment:** I used `gh copilot` terminal sessions to build a tool that calls `gh copilot` as part of its features. Recursive productivity at its finest.
Here's the weird part: I used `gh copilot` to build a tool that calls `gh copilot`. 

Three features I built with Copilot CLI that now use Copilot CLI:

**1. OSCAL Catalog Parser** (~200 lines)
Needed to parse the 10MB NIST 800-53 JSON catalog. Ran this:
Enter fullscreen mode Exit fullscreen mode


bash
$ gh copilot suggest -t shell

"Parse 10MB NIST OSCAL catalog JSON, extract control IDs,
titles and descriptions into searchable format"

Got working code in one session. Just worked.

**2. AI Validator** 
This one shells out to `gh copilot explain` to verify implementations:
Enter fullscreen mode Exit fullscreen mode


bash
$ gh copilot suggest -t shell

"Create Node function that calls gh copilot explain with code
context and NIST requirement, parse response"

It gave me the `execSync` pattern, error handling, output parsing - built it in an hour.

**3. AI Suggester**
Detects your stack (Python/Node/Docker) and calls `gh copilot suggest`:
Enter fullscreen mode Exit fullscreen mode


bash
$ gh copilot suggest -t shell

"Detect tech stack from files, build prompt for gh copilot
suggest, execute and capture output"



Got the whole tech detection + CLI orchestration in one go.

What changed for me:

Before: Alt-tab to browser, search NIST docs, read generic StackOverflow, try to adapt to my stack. 2-3 hours per control.

After: Stay in terminal, `gh copilot suggest` with my exact question, get code that works with my stack. 20-30 minutes per control.

Used `gh copilot` 10+ times while building this. Never left the terminal. No iteration needed - first suggestions were production-ready.

The meta part is that I built a compliance tool using Copilot CLI, and now that tool uses Copilot CLI to teach compliance. It's recursive but actually makes sense - why wouldn't a CLI extension leverage other CLI extensions?
Enter fullscreen mode Exit fullscreen mode

Top comments (0)