Law firms are hemorrhaging money on AI tools they barely control. Harvey AI? ~$1,200/user/month. CoCounsel? Hundreds more. And every one of them sends your most sensitive client documents to someone else's cloud.
I got tired of watching that happen. So I built LegalLens — a fully open-source, self-hosted AI document intelligence platform that does everything those tools do, runs on your own machine, and costs exactly $0.
The Problem Nobody Wants to Talk About
Picture this: A junior associate uploads a confidential M&A agreement to an AI tool to get a quick summary. The document vanishes into a third-party server farm. The firm pays $1,200 per user per month for that privilege. And if the AI hallucinates a clause that doesn't exist? There's no citation to verify it.
This isn't hypothetical. It's the current state of legal AI.
Attorney-client privilege. Trade secrets. Sealed settlement terms. These aren't files you casually hand to a SaaS product. But the alternative — doing it all manually — means associates spending 6-hour stretches reading every line of a 200-page supply chain contract looking for indemnification clauses buried on page 147.
There had to be a better way.
What LegalLens Actually Does
LegalLens is an open-source AI-powered document intelligence platform. Upload a contract, pleading, court order, or NDA. It reads it, understands it, and lets you interrogate it like you would a senior partner who's already read the whole thing.
Here's what it packs in:
9 AI-Powered Analysis Features
| Feature | What You Get |
|---|---|
| Summary | Key points, parties, document type — instantly |
| Risk Analysis | Risk score 0–100, individual risks ranked by severity |
| Compliance Checklist | 15+ standard provisions, scored pass/fail/needs review |
| Obligations | Every duty, right, and restriction — with deadlines |
| Timeline | Chronological events, milestones, and expirations |
| Document Comparison | Side-by-side diff of two contracts |
| Legal Memo | AI-generated brief from your saved research |
| Query Expansion | Suggests legal synonyms to widen your search |
| RAG Q&A | Ask anything, get answers with numbered citations [1][2] |
Clause Finder with 12 Pre-Built Templates
Stop ctrl+F-ing through 300 pages. Pre-built clause search templates cover:
- Indemnification
- Force Majeure
- Limitation of Liability
- Termination
- Confidentiality / NDA
- Governing Law
- Intellectual Property
- Non-Compete
- Payment Terms
- Assignment
- Representations & Warranties
- Notices
Document Intelligence — Auto-Extracted
The moment a document is uploaded, LegalLens automatically pulls:
- Party names from agreement headers
- Key dates — effective dates, deadlines, expirations
- Dollar amounts and currencies
- Defined terms (capitalized terms in quotes)
- Governing law and jurisdiction
- Legal references — statutes and case citations
- Document type — Contract, Pleading, Court Order, Memorandum, etc.
No prompting. No manual setup. It just works.
The Real Differentiator: Your Data Stays With You
Every commercial legal AI tool sends your documents somewhere. LegalLens does not.
With Ollama (free, local LLM), the entire pipeline runs on your machine:
Document upload -> text extraction -> vector embeddings -> ChromaDB
All on your hardware. Nothing leaves.
If you prefer cloud LLMs (Claude, GPT-4, Azure OpenAI), you can configure those too. The system supports a fallback chain — it tries providers in order until one responds. But the choice is always yours.
Setup Is One Command
Seriously. That's it:
git clone https://github.com/LakshmiSravyaVedantham/legal-lens.git
cd legal-lens
cp .env.example .env
docker compose up --build
# Open http://localhost
Want to run it fully offline with a local LLM?
docker compose --profile ollama up --build
Then pull a model:
ollama pull llama3.1:8b
Everything — FastAPI backend, React frontend, MongoDB, ChromaDB, Nginx — comes up in one shot. The semantic search engine works immediately. AI features activate the moment you point to any LLM.
How It's Built (For the Devs in the Room)
No LangChain. Direct LLM integration — simpler, more debuggable, easier to extend.
Frontend (React 19 + TypeScript + Tailwind CSS 4)
|
REST API
|
FastAPI Backend (async Python)
| | |
ChromaDB MongoDB LLM Layer
(vectors) (metadata) (Ollama / Claude / GPT / Azure)
|
sentence-transformers
(384-dim embeddings, 80MB model, no GPU required)
| Layer | Tech |
|---|---|
| Frontend | React 19 + TypeScript + Vite + Tailwind CSS 4 |
| Backend | Python FastAPI (async) |
| Database | MongoDB with Motor async driver |
| Vector DB | ChromaDB |
| Embeddings |
all-MiniLM-L6-v2 — 384-dim, runs on CPU |
| LLM | Ollama / Anthropic / OpenAI / Azure (pluggable) |
| Auth | JWT + bcrypt + Fernet encryption |
| Deploy | Docker Compose |
| CI/CD | GitHub Actions — lint, test, Docker build |
The embedding model runs on CPU. No GPU required to get started.
LegalLens vs. The Tools You're Paying For
| Feature | LegalLens | Harvey AI | CoCounsel | Luminance |
|---|---|---|---|---|
| Monthly Cost | $0 | ~$1,200/user | ~$250/bundle | Enterprise |
| Semantic Search | YES | YES | YES | YES |
| RAG Q&A with Citations | YES | YES | YES | — |
| Risk Analysis | YES | YES | — | YES |
| Clause Finder | YES | — | — | YES |
| Document Comparison | YES | YES | — | YES |
| Legal Memo Generation | YES | YES | YES | — |
| 100% Data Privacy | YES | NO | NO | Partial |
| Works Fully Offline | YES | NO | NO | NO |
| Multi-LLM Support | YES | NO | NO | NO |
| Self-Hosted | YES | NO | NO | NO |
| Open Source | YES | NO | NO | NO |
It's Not Just for Lawyers
Here's who can actually use this right now:
Startup founders reviewing term sheets and investor agreements before signing anything.
Freelancers and consultants who deal with client contracts and NDAs regularly but can't afford legal review for every document.
Small law firms that need AI tooling but are priced out of enterprise solutions.
Legal ops teams at companies that need document analysis at scale without a six-figure SaaS bill.
Law students and researchers studying contract patterns, clause variations, or compliance across jurisdictions.
Developers who want to build legal AI features into their own products using an open, well-structured codebase.
What Makes the Q&A Actually Trustworthy
One of the biggest criticisms of legal AI is hallucination — the model confidently states something that isn't in the document. LegalLens addresses this directly.
Every answer from the Q&A engine includes numbered citations that link back to the exact document and page number. If the answer says "the indemnification clause limits liability to $500,000 [1]", citation [1] takes you directly to the paragraph in the original file.
You can verify. You can audit. The AI is a tool, not an oracle.
Enterprise-Grade Platform Features
This isn't a weekend script. It's built to run in production:
- Multi-tenant — organization-based isolation with role-based access control
- JWT auth — access + refresh tokens, bcrypt password hashing
- Rate limiting — per-endpoint limits on auth, AI calls, uploads, and search
- Audit logging — every upload, deletion, and analysis is logged with IP tracking
- Command palette (Cmd+K) — spotlight-style navigation across the entire app
- Analytics dashboard — search trends, top queries, activity timelines
- Cache-first AI — analysis results cached in MongoDB so repeated queries don't re-hit the LLM
Run It, Break It, Build On It
The repo is live, the CI is green, and the code is MIT licensed:
github.com/LakshmiSravyaVedantham/legal-lens
If you want to contribute, here's what would make the biggest impact right now:
- OCR for scanned PDFs — Tesseract or PaddleOCR integration
- Hybrid search — BM25 + semantic for better recall
- Jurisdiction-specific clause templates — GDPR, CCPA, HIPAA
- Accessibility improvements — keyboard navigation, screen reader support
- More document types — patent filings, regulatory submissions
The development setup is straightforward:
# Backend
python3 -m venv .venv && source .venv/bin/activate
pip install -r backend/requirements.txt
uvicorn backend.main:app --reload --port 8000
# Frontend (new terminal)
cd frontend && npm install && npm run dev
Full Swagger docs at http://localhost:8000/docs once you're running.
The Bottom Line
Legal AI shouldn't be a $1,200/month luxury. Document intelligence shouldn't mean surrendering privileged client files to a third-party server. And "AI-powered" shouldn't be a black box with no citations and no accountability.
LegalLens is the answer I built because I wanted it to exist. It's free, it's open, and it works.
Star the repo. Try it. Tell me what breaks.
Top comments (0)