This is a submission for the GitHub Copilot CLI Challenge
What I Built
RealityCheck CLI is a Python command-line tool that transforms legal contract PDFs into structured, actionable risk intelligence — not summaries, but real decision-grade analysis you can act on before signing.
Most people sign contracts they can't fully parse. RealityCheck makes the risk explicit, structured, and actionable.
The Problem
You receive a consulting agreement, employment contract, or NDA. It's 8 pages of dense legal text. You skim it, maybe worry about a clause or two, and sign anyway. Sound familiar?
The gap between "I read it" and "I understand the risk" is where people get burned — unlimited liability exposure, one-sided termination rights, overbroad IP assignments, missing payment protections.
The Solution
RealityCheck CLI takes any contract PDF and produces:
- 5 quantified risk metrics — Overall Risk Score (1-100), Power Imbalance (0-100), Ambiguity Index (0-100), Protection Coverage (0-100), and an original Leverage Index™ (0-100) showing your negotiation strength
- Clause-by-clause classification across 7 legal categories (Non-Compete, IP Transfer, Liability, Termination, Financial Risk, Privacy, Neutral)
- Signal detection — flags vague language ("sole discretion", "without notice"), one-sided rights, liability expansion, and missing protections
- Missing protections scan — checks for 6 critical protections: payment timeline, termination notice, cure period, liability cap, breach notification window, IP retention
- Negotiation-ready outputs — auto-generated email drafts with specific clause rewrites, ready to send to the counterparty
- Contract comparison — diff two versions of a contract to catch new risks, expanded liability, or extended non-compete duration between drafts
- Optional LLM enrichment — plug in Google Gemini for deeper clause classification alongside the fast heuristic engine
Architecture
PDF → [ingest] → [clauses] → [analysis] → [scoring] → [negotiation] → [output]
↕ ↕
[llm_client] [comparison]
The tool is modular by design — 9 internal packages wired through a single orchestration pipeline:
| Module | Purpose |
|---|---|
ingest/ |
PDF extraction via pdfplumber + header/footer removal |
clauses/ |
Clause segmentation by heading detection + text normalization |
analysis/ |
Heuristic classification engine + optional Gemini LLM enrichment |
scoring/ |
Weighted multi-factor risk engine with category-specific weights |
negotiation/ |
Email drafts + clause rewrite suggestions |
comparison/ |
Smart clause matching + delta analysis with legal-domain flags |
output/ |
Rich terminal rendering + JSON artifact export |
config/ |
Environment-based settings (API keys, thresholds) |
cli/ |
Typer-based CLI with analyze and compare commands |
Key Design Decisions
- Heuristic-first, LLM-optional — Works fully offline with regex pattern matching. No API key needed for the core analysis. LLM only enriches, never replaces.
- Weighted multi-factor scoring — Not a single naive score, but 5 complementary metrics with category-specific weights (Liability: 0.22, Financial Risk: 0.20, IP Transfer: 0.17, etc.)
- Actionable by default — Doesn't just flag risk — generates a negotiation email draft and clause rewrites you can actually send.
- Comparison as a first-class feature — Smart clause matching (70% title similarity + 30% text similarity) with domain-specific flags like non-compete duration parsing and liability expansion detection.
Demo
GitHub Repository: github.com/Anandqwe/realitycheck-cli
Setup
git clone https://github.com/Anandqwe/realitycheck-cli.git
cd realitycheck-cli
python -m venv venv
.\venv\Scripts\activate
pip install -r requirements.txt
Demo 1: Analyzing a Real Employment Contract (contract.pdf)
The repo includes a real employment contract template (contract.pdf) — a multi-page agreement with clauses covering probation, compensation, termination, confidentiality, IP assignment, and more.
python -m realitycheck_cli analyze .\contract.pdf
Terminal Output:
╭──────────────────────── Analysis ────────────────────────╮
│ RealityCheck CLI │
│ Contract: contract.pdf │
│ Clauses analyzed: 19 │
╰──────────────────────────────────────────────────────────╯
╭─ Overall Risk Score ─╮ ╭─ Power Imbalance Score ─╮ ╭─ Leverage Index (TM) ─╮
│ 40/100 │ │ 41/100 │ │ 54/100 │
╰──────────────────────╯ ╰─────────────────────────╯ ╰───────────────────────╯
The tool parsed all 19 clauses from the PDF, classified each one, and produced:
- Overall Risk: 40/100 — Moderate risk level
- Power Imbalance: 41/100 — Slightly favors the employer
- Leverage Index: 54/100 — Borderline negotiation position
Category Breakdown:
Category Risk Summary
┏━━━━━━━━━━━━━━━━┳━━━━━━━┳━━━━━━━━┳━━━━━━━━━━━━━━┓
┃ Category ┃ Score ┃ Weight ┃ Contribution ┃
┡━━━━━━━━━━━━━━━━╇━━━━━━━╇━━━━━━━━╇━━━━━━━━━━━━━━┩
│ IP_TRANSFER │ 57 │ 0.17 │ 9.69 │
│ TERMINATION │ 55 │ 0.12 │ 6.60 │
│ PRIVACY │ 52 │ 0.09 │ 4.68 │
│ NEUTRAL │ 36 │ 0.05 │ 1.80 │
└────────────────┴───────┴────────┴──────────────┘
IP Transfer and Termination clauses are the primary risk drivers. The tool detected an "Assignment (Transfer of Contract)" clause attempting broad IP assignment, and termination clauses with limited employee protections.
Ambiguity Detection:
The tool caught a "sole discretion" clause in the Duties section — the employer can unilaterally modify duties "in the sole discretion of the Employer." This gets flagged as VAGUE_LANGUAGE with HIGH severity.
Missing Protections:
╭──────────────── Missing Protections ─────────────────╮
│ - payment timeline │
│ - cure period │
│ - liability cap │
│ - breach notification window │
│ - ip retained │
╰──────────────────────────────────────────────────────╯
5 out of 6 critical protections are missing from this contract — a significant gap.
Auto-Generated Negotiation Email:
╭──────────── Negotiation Draft (Preview) ─────────────╮
│ Subject: Proposed revisions for contract │
│ │
│ Priority clauses to discuss: │
│ - Assignment (Transfer Of Contract Of Employment) │
│ (C-008, risk 57/100): Narrow IP assignment to │
│ deliverables created under this agreement. │
│ - Probation (C-003, risk 55/100): Require written │
│ notice and a cure period before termination. │
│ │
│ Additional protections requested: │
│ - Add explicit language for: payment timeline │
│ - Add explicit language for: liability cap │
│ - Add explicit language for: breach notification │
╰──────────────────────────────────────────────────────╯
This email draft is ready to copy-paste and send to the counterparty. No more staring at a contract wondering what to push back on.
Demo 2: Full Pipeline with the Demo Script
The project includes a PowerShell demo script (demo.ps1) that runs the complete pipeline — analyze both versions, then compare:
.\demo.ps1 -Baseline .\baseline.pdf -Revised .\revised.pdf
This executes 3 steps automatically:
- Step 1: Analyze the baseline contract → produces risk scores, missing protections, negotiation draft
- Step 2: Analyze the revised contract → same analysis on the new version
- Step 3: Compare both → generates a delta report
Comparison Output:
╭─────────────────── Comparison ───────────────────────╮
│ Baseline: baseline.pdf │
│ Revised: revised.pdf │
╰──────────────────────────────────────────────────────╯
╭─ Baseline Risk ─╮ ╭─ Revised Risk ─╮ ╭─ Risk Delta ─╮
│ 17 │ │ 17 │ │ +0 │
╰─────────────────╯ ╰────────────────╯ ╰──────────────╯
╭─ Baseline Leverage ─╮ ╭─ Revised Leverage ─╮ ╭─ Leverage Delta ─╮
│ 60 │ │ 60 │ │ +0 │
╰─────────────────────╯ ╰────────────────────╯ ╰──────────────────╯
The comparison engine uses smart clause matching (70% title similarity + 30% text similarity) to pair clauses across versions and flag:
- NEW_RISK — new high-risk clauses or risk increases ≥20 points
- EXPANDED_LIABILITY — new liability expansion language detected
- EXTENDED_NON_COMPETE — duration increases (parses days/months/years)
Demo 3: JSON Artifact Export
Every analysis produces structured JSON artifacts for downstream workflows:
python -m realitycheck_cli analyze .\contract.pdf --json-output .\artifacts\contract.analysis.json
{
"summary": {
"overall_risk_score": 40,
"power_imbalance_score": 41,
"ambiguity_index": 5,
"protection_coverage_score": 15,
"leverage_index": 54,
"missing_protections": [
"payment_timeline",
"cure_period",
"liability_cap",
"breach_notification_window",
"ip_retained"
]
},
"negotiation_email": "Subject: Proposed revisions for contract..."
}
Each clause includes its category, risk score, risk level, signals, rewrite suggestion, and negotiation points — fully structured for integration into legal tech workflows, dashboards, or review pipelines.
Demo 4: LLM-Enriched Analysis (Optional)
For deeper analysis, plug in Google Gemini:
$env:GEMINI_API_KEY = "your-key"
python -m realitycheck_cli analyze .\contract.pdf --use-llm
The LLM enrichment adds structured signals on top of the heuristic baseline — it doesn't replace the pattern engine, it supplements it. Signals from both engines are merged with deduplication.
Commands Quick Reference
| Command | What it does |
|---|---|
python -m realitycheck_cli analyze contract.pdf |
Analyze a single contract |
python -m realitycheck_cli analyze contract.pdf --use-llm |
Analyze with Gemini enrichment |
python -m realitycheck_cli analyze contract.pdf -j output.json |
Export JSON artifacts |
python -m realitycheck_cli compare baseline.pdf revised.pdf |
Compare two contract versions |
.\demo.ps1 -Baseline baseline.pdf -Revised revised.pdf |
Run full demo pipeline |
.\demo.ps1 -Baseline baseline.pdf -Revised revised.pdf -UseLLM |
Demo with LLM |
My Experience with GitHub Copilot CLI
GitHub Copilot was my co-pilot throughout this entire build — from architecture decisions to implementation details.
Scaffolding the Architecture
When I started, I had the idea but not the structure. I described what I wanted to Copilot:
"A CLI tool that parses legal PDFs, classifies clause risk, detects power imbalance, and generates negotiation outputs."
Copilot helped me design the modular architecture — separating concerns into ingest/, clauses/, analysis/, scoring/, negotiation/, comparison/, and output/ packages. This clean separation made each module independently testable and swappable.
Building the Heuristic Engine
The pattern-based classification engine in analysis/heuristics.py was built iteratively with Copilot. I'd describe a legal concept — "detect clauses that mention sole discretion or unilateral rights" — and Copilot would generate the regex patterns, signal types, and severity mappings. The result is a comprehensive heuristic engine that covers 7 clause categories, 4 signal types, and 6 missing-protection checks — all without any API calls.
The Scoring System
The weighted multi-factor scoring system was where Copilot really shined. I asked it to help design a scoring model where:
- Different clause categories have different weights (liability should matter more than neutral clauses)
- Vague language and missing protections should add penalty points
- There should be a composite "Leverage Index" that tells you your negotiation strength
Copilot helped me implement the weighted average in scoring/risk_engine.py, the power imbalance detector in scoring/power_imbalance.py, and the Leverage Index formula in scoring/leverage.py — each with clear, auditable logic rather than a black-box score.
Rich Terminal Output
The premium terminal output with Rich was built entirely in collaboration with Copilot. Color-coded score cards (red ≥80, yellow ≥60, green <60), formatted tables for category breakdowns, and the negotiation draft preview panel — Copilot generated the Rich markup and helped me iterate on the layout until it felt polished and professional.
Contract Comparison Engine
The comparison module was the most complex feature. Copilot helped me implement:
- Clause matching with weighted similarity scoring (70% title + 30% text, 0.55 threshold)
- Non-compete duration parsing that converts between days, months, and years for accurate comparison
- Liability expansion detection with domain-specific legal patterns
- Risk flag generation for new risks, expanded scope, and extended terms
LLM Integration
Integrating Google Gemini as an optional enrichment layer was straightforward with Copilot's help. It generated the structured JSON system prompt, response parsing, schema validation, and the signal-merging logic that deduplicates heuristic and LLM signals by key.
Testing
Copilot helped scaffold the test suite in tests/ — unit tests for the heuristic engine, scoring calculations, LLM client mocking, and comparison logic. The tests validate that the scoring math is correct and the classification patterns work as expected.
What Copilot Changed
Without Copilot, this project would have been significantly harder to ship as a solo developer. The legal domain knowledge encoding (regex patterns for clause types, signal detection rules, scoring weights) is the kind of tedious, error-prone work that Copilot accelerates dramatically. It turned what could have been weeks of research and implementation into a focused, iterative build process where I could stay in flow and keep shipping.
The biggest impact was on code quality — Copilot consistently suggested Pydantic models for data validation, proper error handling boundaries, and clean separation of concerns. The codebase ended up more maintainable than most solo projects I've built.
Tech Stack: Python 3.10+ | Typer | Rich | pdfplumber | Pydantic | Google Gemini (optional)









Top comments (0)