DEV Community

Cover image for RealityCheck CLI — Turn Legal Contracts into Decision-Grade Risk Intelligence
Anand Krishna
Anand Krishna

Posted on

RealityCheck CLI — Turn Legal Contracts into Decision-Grade Risk Intelligence

GitHub Copilot CLI Challenge Submission

This is a submission for the GitHub Copilot CLI Challenge

What I Built

RealityCheck CLI is a Python command-line tool that transforms legal contract PDFs into structured, actionable risk intelligence — not summaries, but real decision-grade analysis you can act on before signing.

Most people sign contracts they can't fully parse. RealityCheck makes the risk explicit, structured, and actionable.

The Problem

You receive a consulting agreement, employment contract, or NDA. It's 8 pages of dense legal text. You skim it, maybe worry about a clause or two, and sign anyway. Sound familiar?

The gap between "I read it" and "I understand the risk" is where people get burned — unlimited liability exposure, one-sided termination rights, overbroad IP assignments, missing payment protections.

The Solution

RealityCheck CLI takes any contract PDF and produces:

  • 5 quantified risk metrics — Overall Risk Score (1-100), Power Imbalance (0-100), Ambiguity Index (0-100), Protection Coverage (0-100), and an original Leverage Index™ (0-100) showing your negotiation strength
  • Clause-by-clause classification across 7 legal categories (Non-Compete, IP Transfer, Liability, Termination, Financial Risk, Privacy, Neutral)
  • Signal detection — flags vague language ("sole discretion", "without notice"), one-sided rights, liability expansion, and missing protections
  • Missing protections scan — checks for 6 critical protections: payment timeline, termination notice, cure period, liability cap, breach notification window, IP retention
  • Negotiation-ready outputs — auto-generated email drafts with specific clause rewrites, ready to send to the counterparty
  • Contract comparison — diff two versions of a contract to catch new risks, expanded liability, or extended non-compete duration between drafts
  • Optional LLM enrichment — plug in Google Gemini for deeper clause classification alongside the fast heuristic engine

Architecture

PDF → [ingest] → [clauses] → [analysis] → [scoring] → [negotiation] → [output]
                                  ↕                                        ↕
                              [llm_client]                          [comparison]
Enter fullscreen mode Exit fullscreen mode

The tool is modular by design — 9 internal packages wired through a single orchestration pipeline:

Module Purpose
ingest/ PDF extraction via pdfplumber + header/footer removal
clauses/ Clause segmentation by heading detection + text normalization
analysis/ Heuristic classification engine + optional Gemini LLM enrichment
scoring/ Weighted multi-factor risk engine with category-specific weights
negotiation/ Email drafts + clause rewrite suggestions
comparison/ Smart clause matching + delta analysis with legal-domain flags
output/ Rich terminal rendering + JSON artifact export
config/ Environment-based settings (API keys, thresholds)
cli/ Typer-based CLI with analyze and compare commands

Key Design Decisions

  • Heuristic-first, LLM-optional — Works fully offline with regex pattern matching. No API key needed for the core analysis. LLM only enriches, never replaces.
  • Weighted multi-factor scoring — Not a single naive score, but 5 complementary metrics with category-specific weights (Liability: 0.22, Financial Risk: 0.20, IP Transfer: 0.17, etc.)
  • Actionable by default — Doesn't just flag risk — generates a negotiation email draft and clause rewrites you can actually send.
  • Comparison as a first-class feature — Smart clause matching (70% title similarity + 30% text similarity) with domain-specific flags like non-compete duration parsing and liability expansion detection.

Demo

GitHub Repository: github.com/Anandqwe/realitycheck-cli

Setup

git clone https://github.com/Anandqwe/realitycheck-cli.git
cd realitycheck-cli
python -m venv venv
.\venv\Scripts\activate
pip install -r requirements.txt
Enter fullscreen mode Exit fullscreen mode

Demo 1: Analyzing a Real Employment Contract (contract.pdf)

The repo includes a real employment contract template (contract.pdf) — a multi-page agreement with clauses covering probation, compensation, termination, confidentiality, IP assignment, and more.

python -m realitycheck_cli analyze .\contract.pdf
Enter fullscreen mode Exit fullscreen mode

Terminal Output:

╭──────────────────────── Analysis ────────────────────────╮
│ RealityCheck CLI                                         │
│ Contract: contract.pdf                                   │
│ Clauses analyzed: 19                                     │
╰──────────────────────────────────────────────────────────╯
╭─ Overall Risk Score ─╮ ╭─ Power Imbalance Score ─╮ ╭─ Leverage Index (TM) ─╮
│        40/100        │ │         41/100          │ │        54/100         │
╰──────────────────────╯ ╰─────────────────────────╯ ╰───────────────────────╯
Enter fullscreen mode Exit fullscreen mode

The tool parsed all 19 clauses from the PDF, classified each one, and produced:

  • Overall Risk: 40/100 — Moderate risk level
  • Power Imbalance: 41/100 — Slightly favors the employer
  • Leverage Index: 54/100 — Borderline negotiation position

Category Breakdown:

              Category Risk Summary
┏━━━━━━━━━━━━━━━━┳━━━━━━━┳━━━━━━━━┳━━━━━━━━━━━━━━┓
┃ Category       ┃ Score ┃ Weight ┃ Contribution ┃
┡━━━━━━━━━━━━━━━━╇━━━━━━━╇━━━━━━━━╇━━━━━━━━━━━━━━┩
│ IP_TRANSFER    │    57 │   0.17 │         9.69 │
│ TERMINATION    │    55 │   0.12 │         6.60 │
│ PRIVACY        │    52 │   0.09 │         4.68 │
│ NEUTRAL        │    36 │   0.05 │         1.80 │
└────────────────┴───────┴────────┴──────────────┘
Enter fullscreen mode Exit fullscreen mode

IP Transfer and Termination clauses are the primary risk drivers. The tool detected an "Assignment (Transfer of Contract)" clause attempting broad IP assignment, and termination clauses with limited employee protections.

Ambiguity Detection:

The tool caught a "sole discretion" clause in the Duties section — the employer can unilaterally modify duties "in the sole discretion of the Employer." This gets flagged as VAGUE_LANGUAGE with HIGH severity.

Missing Protections:

╭──────────────── Missing Protections ─────────────────╮
│ - payment timeline                                    │
│ - cure period                                         │
│ - liability cap                                       │
│ - breach notification window                          │
│ - ip retained                                         │
╰──────────────────────────────────────────────────────╯
Enter fullscreen mode Exit fullscreen mode

5 out of 6 critical protections are missing from this contract — a significant gap.

Auto-Generated Negotiation Email:

╭──────────── Negotiation Draft (Preview) ─────────────╮
│ Subject: Proposed revisions for contract              │
│                                                       │
│ Priority clauses to discuss:                          │
│ - Assignment (Transfer Of Contract Of Employment)     │
│   (C-008, risk 57/100): Narrow IP assignment to       │
│   deliverables created under this agreement.          │
│ - Probation (C-003, risk 55/100): Require written     │
│   notice and a cure period before termination.        │
│                                                       │
│ Additional protections requested:                     │
│ - Add explicit language for: payment timeline         │
│ - Add explicit language for: liability cap            │
│ - Add explicit language for: breach notification      │
╰──────────────────────────────────────────────────────╯
Enter fullscreen mode Exit fullscreen mode

This email draft is ready to copy-paste and send to the counterparty. No more staring at a contract wondering what to push back on.

Demo 2: Full Pipeline with the Demo Script

The project includes a PowerShell demo script (demo.ps1) that runs the complete pipeline — analyze both versions, then compare:

.\demo.ps1 -Baseline .\baseline.pdf -Revised .\revised.pdf
Enter fullscreen mode Exit fullscreen mode

This executes 3 steps automatically:

  1. Step 1: Analyze the baseline contract → produces risk scores, missing protections, negotiation draft
  2. Step 2: Analyze the revised contract → same analysis on the new version
  3. Step 3: Compare both → generates a delta report

Comparison Output:

╭─────────────────── Comparison ───────────────────────╮
│ Baseline: baseline.pdf                                │
│ Revised: revised.pdf                                  │
╰──────────────────────────────────────────────────────╯
╭─ Baseline Risk ─╮ ╭─ Revised Risk ─╮ ╭─ Risk Delta ─╮
│       17        │ │       17       │ │      +0      │
╰─────────────────╯ ╰────────────────╯ ╰──────────────╯
╭─ Baseline Leverage ─╮ ╭─ Revised Leverage ─╮ ╭─ Leverage Delta ─╮
│         60          │ │         60         │ │        +0        │
╰─────────────────────╯ ╰────────────────────╯ ╰──────────────────╯
Enter fullscreen mode Exit fullscreen mode

The comparison engine uses smart clause matching (70% title similarity + 30% text similarity) to pair clauses across versions and flag:

  • NEW_RISK — new high-risk clauses or risk increases ≥20 points
  • EXPANDED_LIABILITY — new liability expansion language detected
  • EXTENDED_NON_COMPETE — duration increases (parses days/months/years)

Demo 3: JSON Artifact Export

Every analysis produces structured JSON artifacts for downstream workflows:

python -m realitycheck_cli analyze .\contract.pdf --json-output .\artifacts\contract.analysis.json
Enter fullscreen mode Exit fullscreen mode
{
  "summary": {
    "overall_risk_score": 40,
    "power_imbalance_score": 41,
    "ambiguity_index": 5,
    "protection_coverage_score": 15,
    "leverage_index": 54,
    "missing_protections": [
      "payment_timeline",
      "cure_period",
      "liability_cap",
      "breach_notification_window",
      "ip_retained"
    ]
  },
  "negotiation_email": "Subject: Proposed revisions for contract..."
}
Enter fullscreen mode Exit fullscreen mode

Each clause includes its category, risk score, risk level, signals, rewrite suggestion, and negotiation points — fully structured for integration into legal tech workflows, dashboards, or review pipelines.

Demo 4: LLM-Enriched Analysis (Optional)

For deeper analysis, plug in Google Gemini:

$env:GEMINI_API_KEY = "your-key"
python -m realitycheck_cli analyze .\contract.pdf --use-llm
Enter fullscreen mode Exit fullscreen mode

The LLM enrichment adds structured signals on top of the heuristic baseline — it doesn't replace the pattern engine, it supplements it. Signals from both engines are merged with deduplication.

Commands Quick Reference

Command What it does
python -m realitycheck_cli analyze contract.pdf Analyze a single contract
python -m realitycheck_cli analyze contract.pdf --use-llm Analyze with Gemini enrichment
python -m realitycheck_cli analyze contract.pdf -j output.json Export JSON artifacts
python -m realitycheck_cli compare baseline.pdf revised.pdf Compare two contract versions
.\demo.ps1 -Baseline baseline.pdf -Revised revised.pdf Run full demo pipeline
.\demo.ps1 -Baseline baseline.pdf -Revised revised.pdf -UseLLM Demo with LLM

My Experience with GitHub Copilot CLI

GitHub Copilot was my co-pilot throughout this entire build — from architecture decisions to implementation details.

Scaffolding the Architecture

When I started, I had the idea but not the structure. I described what I wanted to Copilot:

"A CLI tool that parses legal PDFs, classifies clause risk, detects power imbalance, and generates negotiation outputs."

Copilot helped me design the modular architecture — separating concerns into ingest/, clauses/, analysis/, scoring/, negotiation/, comparison/, and output/ packages. This clean separation made each module independently testable and swappable.

Building the Heuristic Engine

The pattern-based classification engine in analysis/heuristics.py was built iteratively with Copilot. I'd describe a legal concept — "detect clauses that mention sole discretion or unilateral rights" — and Copilot would generate the regex patterns, signal types, and severity mappings. The result is a comprehensive heuristic engine that covers 7 clause categories, 4 signal types, and 6 missing-protection checks — all without any API calls.

The Scoring System

The weighted multi-factor scoring system was where Copilot really shined. I asked it to help design a scoring model where:

  • Different clause categories have different weights (liability should matter more than neutral clauses)
  • Vague language and missing protections should add penalty points
  • There should be a composite "Leverage Index" that tells you your negotiation strength

Copilot helped me implement the weighted average in scoring/risk_engine.py, the power imbalance detector in scoring/power_imbalance.py, and the Leverage Index formula in scoring/leverage.py — each with clear, auditable logic rather than a black-box score.

Rich Terminal Output

The premium terminal output with Rich was built entirely in collaboration with Copilot. Color-coded score cards (red ≥80, yellow ≥60, green <60), formatted tables for category breakdowns, and the negotiation draft preview panel — Copilot generated the Rich markup and helped me iterate on the layout until it felt polished and professional.

Contract Comparison Engine

The comparison module was the most complex feature. Copilot helped me implement:

  • Clause matching with weighted similarity scoring (70% title + 30% text, 0.55 threshold)
  • Non-compete duration parsing that converts between days, months, and years for accurate comparison
  • Liability expansion detection with domain-specific legal patterns
  • Risk flag generation for new risks, expanded scope, and extended terms

LLM Integration

Integrating Google Gemini as an optional enrichment layer was straightforward with Copilot's help. It generated the structured JSON system prompt, response parsing, schema validation, and the signal-merging logic that deduplicates heuristic and LLM signals by key.

Testing

Copilot helped scaffold the test suite in tests/ — unit tests for the heuristic engine, scoring calculations, LLM client mocking, and comparison logic. The tests validate that the scoring math is correct and the classification patterns work as expected.

What Copilot Changed

Without Copilot, this project would have been significantly harder to ship as a solo developer. The legal domain knowledge encoding (regex patterns for clause types, signal detection rules, scoring weights) is the kind of tedious, error-prone work that Copilot accelerates dramatically. It turned what could have been weeks of research and implementation into a focused, iterative build process where I could stay in flow and keep shipping.

The biggest impact was on code quality — Copilot consistently suggested Pydantic models for data validation, proper error handling boundaries, and clean separation of concerns. The codebase ended up more maintainable than most solo projects I've built.


Tech Stack: Python 3.10+ | Typer | Rich | pdfplumber | Pydantic | Google Gemini (optional)

Try it: github.com/Anandqwe/realitycheck-cli

Top comments (0)