After testing several local LLMs on real legal documents (contracts, NDAs, service agreements), here's an honest breakdown of what's actually useful and where local models fall short.
TL;DR: Local LLMs handle first-pass extraction well. Complex legal interpretation — not yet.
Why local LLMs for legal documents?
The compliance issue is real:
- Uploading client contracts to ChatGPT may violate NDA terms
- GDPR Article 28 requires a data processing agreement — most cloud AI providers don't have one for individual users
- Many firms have explicit policies banning cloud AI for client documents
Local models (via Ollama) solve the compliance problem. The question is whether they're good enough to be useful.
Test setup
- Models tested: llama3.1:8b, llama3.1:70b (Q4), qwen2.5:7b, qwen2.5:14b, mistral:7b
- Document types: NDAs, service agreements, consulting contracts, lease agreements
- Hardware: 16GB RAM (8b models), 32GB RAM (70b)
- Evaluation: Manual review by a lawyer friend (non-scientific, practical assessment)
What works well ✅
1. Extracting structured information
Task: "List all parties and their roles."
All 8b+ models do this reliably. Output:
PARTIES:
- Acme Corp (the "Company") — service provider
- John Smith Consulting LLC (the "Consultant") — independent contractor
- Mentioned as guarantor: Jane Smith (personal guarantee, Section 8.2)
Accuracy: ~95% on standard contracts.
2. Key date extraction
Task: "List all dates and deadlines."
Works well. Models catch effective dates, termination dates, notice periods, payment due dates.
Miss rate: ~5-10% for dates buried in complex conditional clauses.
3. Payment terms summary
Task: "Summarize payment terms."
Reliable on standard payment structures. Handles:
- Fixed fee contracts
- Milestone-based payments
- Retainer agreements
- Net-30/60/90 terms
Struggles with: complex multi-tier pricing, earn-out structures, revenue share formulas.
4. "Does this contract have a non-compete clause?"
Simple yes/no questions work well across all tested models. Useful for quick triage.
5. Plain-language summary of a section
Task: "Explain Section 7 in plain language."
This is where local LLMs genuinely help non-lawyers. Translating legalese to plain English is a strong use case even for smaller models.
What doesn't work ❌
1. Complex legal interpretation
Task: "Does this indemnification clause put the contractor at unreasonable risk?"
Models either give a wishy-washy non-answer or confidently state something incorrect. Don't use for risk assessment.
2. Identifying all unusual clauses in a long contract
On 50+ page contracts, models miss unusual provisions that appear late in the document. Context window limitations matter here.
Workaround: Process section-by-section, then synthesize.
3. Comparing two contracts
"How does this NDA differ from a standard NDA?" requires the model to have a reliable internal reference. Results are inconsistent.
4. Jurisdiction-specific analysis
Anything requiring knowledge of specific case law or state-specific provisions — unreliable.
Model comparison for legal documents
| Model | Speed | Extraction accuracy | Clause identification | Recommended for |
|---|---|---|---|---|
| qwen2.5:3b | Fast | 85% | 75% | Quick triage, 8GB machines |
| llama3.1:8b | Medium | 90% | 82% | Daily use, 16GB machines |
| qwen2.5:14b | Slow | 92% | 85% | Higher accuracy needed |
| llama3.1:70b | Very slow | 94% | 88% | Best quality, 32GB+ |
| mistral:7b | Fast | 83% | 72% | Not recommended for legal |
The right use case
Local LLMs for legal documents = first-pass reading assistant, not a lawyer.
Practical workflow:
- Run local LLM → get structured extraction in 2-3 minutes
- Review summary to identify sections that need deep reading
- Do the deep reading yourself (or with a real lawyer)
Time saved: reading a 40-page contract to find the 5 things you actually need to review goes from 90 minutes → 15 minutes.
System prompt that works best
After many iterations, this structure produces the most reliable legal document analysis:
You are a legal document analyst. Your task is to EXTRACT information, not interpret it.
For the provided contract, produce a structured report with these exact sections:
PARTIES: [List each named party, their defined term, and role]
EFFECTIVE DATE: [Date the agreement begins]
TERM: [Duration and termination conditions]
KEY OBLIGATIONS: [Bullet list per party]
PAYMENT: [All payment-related terms]
TERMINATION: [How either party can end the agreement]
UNUSUAL PROVISIONS: [Any non-standard clauses worth flagging]
Be precise. If information is not present, write "Not specified."
Do not give legal advice or risk assessments.
Full tool
I built a Windows app around this workflow: local PDF/DOCX processing, 10 domain modes, batch processing for multiple contracts. Available at https://journeyer376.gumroad.com/l/ussytd.
It's aimed at lawyers and consultants who want the productivity benefit of AI without the compliance risk of cloud tools.
What document types are you using local LLMs for? Would be interested in test cases from others.
Top comments (0)