Two months ago I signed a contract with an auto-renewal clause I didn't notice. When I tried to cancel, I owed another year. That $1,200
mistake turned into PactLens — a Chrome extension that reviews contracts before you sign.
The core problem: LLMs hallucinate. You can't just ask an AI "is this clause risky" — it'll confidently make up laws that don't exist. So I
needed real statutes as a source of truth.
The RAG Pipeline
Here's how it works:
User selects contract text on any page → right-clicks → "Review with PactLens"
The text hits our Cloudflare Worker backend, which extracts clause-level semantics
We run semantic search against 1,700+ pre-loaded statutes using keyword matching + TF-IDF similarity
The top N most relevant statutes are injected into the prompt as context
The AI generates analysis grounded in those actual statutes, not hallucinations
The key insight: ground first, generate second. The AI doesn't need to know every law — it just needs to know how to apply the specific
statutes we feed it.
Why Static Pre-Loading Over a Vector Database
A vector DB would be "cleaner" architecturally. But:
- Embedding 1,700+ statutes costs money and adds latency
- Cloudflare Workers don't have great vector DB support
- The statutes don't change daily — they're stable enough for pre-loading
So I went with plain text matching + TF-IDF stored in KV. Simple, fast, and surprisingly effective.
## What I'd Do Differently
- Start with the Chrome Web Store review guidelines FIRST, not last
- Use a lighter embedding model for better semantic search
- Add a feedback loop — let users flag incorrect analysis
Stack
- Hono + Cloudflare Workers (backend)
- WXT + TypeScript (extension)
- KV for statute storage
- DeepSeek for AI inference
Links
Chrome Store: [link]
Web App: https://pactlens.net/contract
Would love feedback from anyone who's worked on RAG systems — especially on making semantic search fast at the edge.
Top comments (0)