A field manual for legal engineers who have been burned by confident wrong answers.
Three things most legal AI products get backwards:
- Documents leave your machine before you get an answer.
- The model talks first, citations get stapled on later.
- "Page 47" counts as verification.
Picard OSS flips all three.
It is an open-source, local-first legal document assistant: upload PDFs, search with BM25 or multi-constraint CARP, chat with citation-grade answers, run tabular extraction across a matter, and click any [N] marker to jump to the exact bounding box on the source PDF. When retrieval finds nothing, Picard refuses. No LLM call. No improvisation.
Repo: github.com/iamsaurabhc/picard-oss
Hosted sibling: picard.law (enterprise SaaS)
Current release: v0.2.0
You can run it from source, Docker, or a native installer. No Supabase. No Neo4j. No "trust our cloud with privilege."
Chapter 1: Your machine, your matter
Everything that matters stays under .picard-data/ on disk:
.picard-data/
├── picard.db # chunks, FTS5, entities, chat, tabular
├── pdfs/ # raw PDF bytes
└── models/ # fastembed ONNX, optional GLiNER / Presidio
Parsing, OCR, indexes, and PDF storage are local. The only optional outbound traffic is your LLM provider (OpenAI, Anthropic, etc.) or fully local Ollama. Documents do not egress for search, indexing, or viewing.
liteparse extracts layout-aware chunks with normalized bounding boxes. Digital PDFs parse at 150 DPI. Scans route through local PaddleOCR (optional sidecar) or Tesseract at 300 DPI. Every citation downstream inherits spatial provenance from day one.
One command for developers:
git clone https://github.com/iamsaurabhc/picard-oss
cd picard-oss
cp .env.example backend/.env
./scripts/start.sh
# → http://localhost:3000
Or skip the terminal entirely. See Chapter 6.
Chapter 2: Evidence before eloquence
Picard inherits an evidence contract from production legal AI at picard.law. The contract is simple and ruthless:
| Rule | Behavior |
|---|---|
| Citations assigned before synthesis |
[1], [2], [N] map to real chunks with page + bbox before the LLM writes |
| Refuse gate on zero evidence | No retrieval → no LLM → honest refusal |
| Bbox-grounded UX | Click [N] → MultiHighlightPDFViewer highlights the precise region |
| Post-synthesis validation | Unsupported amounts, dates, and drift get stripped |
Query → retrieve (FTS5 / CARP / hybrid) → refuse if empty
→ citation map [1..N] → LLM synthesize → stream → click [N] → bbox
Most RAG pipelines treat the LLM as the protagonist. Picard treats retrieval as the judge and the model as a clerk who may only speak from the record.
A refused answer is not a bug. It is the system doing what your partner wished the chatbot had done at 11:47 PM.
Chapter 3: Relevance beats similarity (and hybrid knows when to help)
Legal retrieval fails when vector search returns semantically similar but legally wrong text. A limitation of liability clause from Agreement B is not helpful when you asked about Agreement A.
Picard's core engine is SQLite FTS5 (BM25): exact phrase matching, explainable scores, sub-millisecond search, zero vector DB to provision.
For conjunctive questions ("party X + date Y + condition Z across 100K pages"), CARP (Constraint-Aware Retrieval Protocol) intersects entity constraints at the page level. No Neo4j cluster. No keyword soup. Auditable bundle formation with diagnostics on the Search page.
Hybrid search: local embeddings, FTS still wins
Picard also ships hybrid retrieval with a local ONNX embedding model (default: BAAI/bge-small-en-v1.5 via fastembed). Vectors live in SQLite as normalized float32 BLOBs. Optional sqlite-vec ANN on Python 3.13+.
The fusion is FTS-first weighted RRF, not "vectors replace keywords":
- Strong FTS hits? Vectors stay in the bench.
- Empty FTS pool? Vector fallback kicks in.
- Mixed case? Weighted reciprocal rank fusion (
w_fts=0.6,k=60) merges both signals.
# backend/.env
ENABLE_HYBRID_SEARCH=true
./scripts/start.sh # downloads ONNX model into .picard-data/models/fastembed
./scripts/backfill-embeddings.sh # index existing PDFs
./scripts/backfill-embeddings.sh --vec-index # page-level vectors
Embeddings never phone home. The model caches on disk. Ingest indexes vectors automatically when hybrid is on.
Design principle: relevance over similarity. Vectors bridge paraphrase gaps; FTS5 and CARP keep legal integrity.
Chapter 4: The PII airlock
Local-first storage does not mean you want client names and Aadhaar numbers riding along to OpenAI.
Picard ships a PII shield: detect locally, mask before cloud LLM calls, restore in the response stream.
| Layer | What happens |
|---|---|
| Regex (always on) | Email, Indian phone, PAN, Aadhaar |
| Presidio (optional pack) | Names, locations, and richer entity types via spaCy |
| Ollama bypass | Fully local inference skips masking entirely |
| Chat UI | "PII shield" toggle in the chat header |
| Tabular | Server default protects cell extraction prompts |
Placeholders look like <EMAIL_ADDRESS_1> or <PERSON_1>. The PIIProxy registers text per request; model_router anonymizes at the litellm boundary; StreamingPIIRestorer puts originals back before you see them.
Documents in your vault stay raw. Redaction is transit protection for cloud LLMs, not ingest erasure. Install the optional Presidio pack from Settings → Optional components.
For air-gapped or fully local Ollama deployments, the airlock doors stay open because nothing leaves the building anyway.
Chapter 5: The workbench (four surfaces, one contract)
Picard is not a single chat box. It is a workbench for legal document engineering.
Unified dashboard + Vault
The home surface (/) combines Ask and Review modes: attach documents, browse the Vault, stream answers, or spin up tabular reviews without context-switching. The Vault (/vault) is your matter file cabinet: upload, parse status, retry, scope documents into chat.
Citation chat
Streaming Q&A with session history, document scope, workflow intent pinning, and [N] pills wired to the PDF panel. Chat latency profiles (quality | balanced | fast) let you trade depth for time-to-first-token without forking the codebase.
The Citation Kernel (Phase 7.0, shipped) centralizes the evidence path: refuse → map → synthesize → validate → optional citation judge. Chat and agent corpus tools share the same kernel. No weaker "agent mode" shortcut.
Tabular review
Define columns in natural language. Picard runs FTS5 retrieval per cell, extracts structured JSON via LLM, and links every cell to source markers. SSE batch generation, flags, Excel export, and a review-side chat panel. Ten NDAs in one sitting is a design target, not a demo fantasy.
Workflow library
~18 built-in assistant and tabular playbooks ship as validated LightFlow flow_json DAGs. Browse, filter by deployment profile (firm/court), preview the step graph, validate, export JSON. Attach workflows in Chat to pin CARP intent. Seed tabular reviews from tabular workflows.
Run and full agent authoring await Phase 7b/7a, but the library is already a catalog of repeatable legal engineering patterns.
Deployment profiles
Firm and court profiles filter workflows, gate tools, and tune agent retrieval caps. Court mode blocks risk-scoring patterns and tightens connector defaults. Same evidence contract, different guardrails.
Chapter 6: Download it like normal software
Picard OSS is not "clone repo or nothing." v0.2.0 ships native binaries via Tauri, built in CI on every version tag:
| Platform | Artifact | CI target |
|---|---|---|
| macOS Apple Silicon | .dmg |
darwin-aarch64 |
| macOS Intel | .dmg |
darwin-x86_64 |
| Windows 64-bit | .exe |
windows-x86_64 |
| Windows 32-bit | .exe |
windows-i686 |
| Linux (Ubuntu amd64) | .deb |
linux-x86_64 |
Downloads publish to GitHub Releases. A manifest.json on gh-pages powers in-app updates (Tauri updater + Settings update check) and the picard.law download page.
macOS: open DMG → Applications (see docs/MACOS_INSTALL.md for Gatekeeper)
Windows: run installer
Linux: sudo dpkg -i Picard*.deb
Docker Compose and GHCR images remain available for teams who prefer containers:
docker compose up --build
# Optional OCR: docker compose --profile ocr up --build
Settings in the app (or the first-run onboarding wizard) stores API keys encrypted under your data directory. Keys never round-trip through the API in plaintext.
Chapter 7: Chester keeps us honest
Picard does not ship "vibes-based QA."
The Chester v. Municipality of Waverly corpus (627 chunks) anchors gold-label regression tests in CI. Metric families have stable IDs used in pytest, eval harnesses, and (roadmap) inline answer panels:
| Family | What it guards |
|---|---|
| R | Snippet recall, precision, bbox coverage |
| C | CARP constraint extraction, page intersection, decoy rejection |
| F | Zero-evidence refuse rate, false refuses |
| CT |
[N] marker resolution, pinpoint bbox accuracy |
| FG | Claim-level grounding, cross-bundle conflation |
| AB | Missed refusal, misleading answers |
cd backend && source .venv/bin/activate
pytest -m corpus -q
./scripts/eval-search.sh
python scripts/eval_scorecard.py
Today, retrieval diagnostics appear inline in chat (RetrievalActivityPanel) and on the Search CARP debug panel. Full post-answer CT/FG/AB badges are on the roadmap.
Chapter 8: Stack for the curious
| Layer | Choice |
|---|---|
| Frontend | Next.js 15, React 19, TypeScript, Shadcn UI |
| Backend | Python 3.11+, FastAPI, SQLAlchemy |
| Database | SQLite + FTS5 (WAL) + optional sqlite-vec |
| liteparse + react-pdf bbox overlay | |
| LLM | litellm (OpenAI, Ollama, tiered SLM/LLM optional) |
| Embeddings | fastembed ONNX (bge-small-en-v1.5) |
| PII | Regex + optional Presidio/spaCy |
| Desktop | Tauri (DMG / EXE / DEB) |
| License | AGPL-3.0 |
Optional component packs (install from Settings): PaddleOCR, GLiNER NER, Presidio PII, agent scaffolding.
Chapter 9: Where Picard sits in the ecosystem
┌─────────────────────────────────────────────────────────────┐
│ Picard.law Production SaaS · GraphRAG · Neo4j │
└──────────────────────────────┬──────────────────────────────┘
│ evidence contract
▼
┌─────────────────────────────────────────────────────────────┐
│ Picard OSS Local-first · FTS5 + CARP · SQLite │
│ PII shield · hybrid · native binaries │
└──────────────────────────────┬──────────────────────────────┘
│ tabular UX + DocPanel patterns
▼
┌─────────────────────────────────────────────────────────────┐
│ Mike OSS Cloud platform · Supabase · workflows │
└─────────────────────────────────────────────────────────────┘
| Picard OSS | Picard.law | Mike OSS | |
|---|---|---|---|
| Deployment | Your machine | Managed SaaS | Cloud |
| Retrieval | FTS5 + CARP + hybrid | GraphRAG | Vector + workflows |
| PII | Local shield for cloud LLM | Enterprise tiers | Supabase Auth |
| Binaries | Mac/Win/Linux | Hosted | Cloud |
| Best for | Legal engineers, air-gap, eval | Production | Full-stack platform |
| License | AGPL-3.0 | Commercial | AGPL-3.0 |
Chapter 10: Shipped vs. loading
Shipped today (Phases 0-6 + 7.0):
- PDF ingest, OCR, FTS5, CARP, hybrid search
- Citation chat + Citation Kernel
- Tabular review + Excel export
- Workflow library (18 built-ins)
- PII shield + optional Presidio
- Settings, onboarding, encrypted secrets
- Chat latency profiles, deployment profiles
- Docker + native installers for 5 platforms
- Vault, unified dashboard, chat history rail
- Chester eval harness + PII e2e tests in CI
In development (honest roadmap):
- LightFlow workflow execution (Phase 7b): Run button is wired but disabled until deterministic DAG runs land
- Full LightAgent authoring loop (Phase 7a): kernel-first agent chat exists; multi-tool orchestration is scaffolded
- Template drafts from guidelines + CSV (Phase 8)
- Optional URL snapshots for web research, air-gap off by default (Phase 9)
- Inline post-answer quality panel (CT/FG/AB badges)
- WCAG gaps: canvas bbox screen reader exposure, streaming live regions
We would rather tell you what is loading than demo what is missing.
Chapter 11: Open source, open contract
| Use case | License |
|---|---|
| Local dev, PoC, eval on your hardware | AGPL-3.0, no fee |
| Fork/redistribute modified versions | AGPL-3.0, source to users |
| Hosted production without AGPL obligations | Commercial license |
Community:
Try it
Download a binary: github.com/iamsaurabhc/picard-oss/releases
Or from source:
git clone https://github.com/iamsaurabhc/picard-oss
cd picard-oss
./scripts/start.sh
Upload a PDF. Wait for parse_status=done. Ask a question. Click [1]. Watch the bbox light up.
If retrieval finds nothing, Picard will refuse. That is the point.
Picard OSS is built by legal engineers who have watched too many models confidently cite the wrong page. Star the repo, run the Chester eval, file an issue when CARP misfires. Evidence before eloquence. Always.
Suggested dev.to tags: #opensource #legaltech #rag #privacy #localfirst #python #nextjs #sqlite #ai #citations
Top comments (2)
"Retrieval as the judge, model as a clerk who may only speak from the record" — that's the whole ballgame, and the refuse-on-zero-evidence gate is underrated. We see the same thing on financial docs: confident-but-wrong answers almost never trace to the model inventing numbers, they trace to retrieval handing it the wrong row. Kindred design choices here (local-first, SQLite FTS5, BM25-first hybrid) — I went the same way with docnest's ingestion engine. One real question: how does CARP handle constraints whose meaning depends on a column header several rows up? Cross-cell table semantics is the part that's broken every parser I've thrown at it.
"Retrieval as judge, model as clerk" is exactly the design bet. And yes: on financial docs the failure mode is almost always the wrong row, not hallucinated numbers. Refuse-on-zero-evidence is underrated until you've watched a partner click through a wrong citation at midnight.
Your CARP question is the one we don't hand-wave: column-header semantics several rows up are not first-class yet.
CARP does constraint intersection on page_entities (+ optional section_key from doc headings). It does not understand that $4.2M in row 14 means "Q3 Revenue" because of a header three rows above. If liteparse keeps that in one table chunk, FTS/hybrid usually saves you. If not, page-level intersection can bundle unrelated co-mentions. We see it on dense financial schedules.
Mitigation today: tabular review paths column semantics through the extraction prompt per cell. Longer term: structured table ingest (row/col + header propagation) so retrieval can refuse when header context is missing, not just when the page is empty.
Docnest sounds like kindred spirits. If you're open to it, I'd be curious what your ingestion engine does for header propagation. That's the layer I'd want to steal if someone has it working.