DEV Community

henry
henry

Posted on

How I Built a Chrome Extension That Analyzes Contracts Using 1,700+ Real Statutes as RAG Grounding

Two months ago I signed a contract with an auto-renewal clause I didn't notice. When I tried to cancel, I owed another year. That $1,200
mistake turned into PactLens — a Chrome extension that reviews contracts before you sign.

The core problem: LLMs hallucinate. You can't just ask an AI "is this clause risky" — it'll confidently make up laws that don't exist. So I
needed real statutes as a source of truth.

The RAG Pipeline

Here's how it works:

  1. User selects contract text on any page → right-clicks → "Review with PactLens"

  2. The text hits our Cloudflare Worker backend, which extracts clause-level semantics

  3. We run semantic search against 1,700+ pre-loaded statutes using keyword matching + TF-IDF similarity

  4. The top N most relevant statutes are injected into the prompt as context

  5. The AI generates analysis grounded in those actual statutes, not hallucinations

The key insight: ground first, generate second. The AI doesn't need to know every law — it just needs to know how to apply the specific
statutes we feed it.

Why Static Pre-Loading Over a Vector Database

A vector DB would be "cleaner" architecturally. But:

  • Embedding 1,700+ statutes costs money and adds latency
  • Cloudflare Workers don't have great vector DB support
  • The statutes don't change daily — they're stable enough for pre-loading

So I went with plain text matching + TF-IDF stored in KV. Simple, fast, and surprisingly effective.

## What I'd Do Differently

  1. Start with the Chrome Web Store review guidelines FIRST, not last
  2. Use a lighter embedding model for better semantic search
  3. Add a feedback loop — let users flag incorrect analysis

Stack

  • Hono + Cloudflare Workers (backend)
  • WXT + TypeScript (extension)
  • KV for statute storage
  • DeepSeek for AI inference

Links

Chrome Store: [link]
Web App: https://pactlens.net/contract

Would love feedback from anyone who's worked on RAG systems — especially on making semantic search fast at the edge.

Top comments (0)