Saurabh C.

Posted on Jun 7

Picard OSS: Legal AI That Lives on Your Machine, Refuses to Bluff, and Ships Binaries

#ai #opensource #privacy #rag

A field manual for legal engineers who have been burned by confident wrong answers.

Three things most legal AI products get backwards:

Documents leave your machine before you get an answer.
The model talks first, citations get stapled on later.
"Page 47" counts as verification.

Picard OSS flips all three.

It is an open-source, local-first legal document assistant: upload PDFs, search with BM25 or multi-constraint CARP, chat with citation-grade answers, run tabular extraction across a matter, and click any [N] marker to jump to the exact bounding box on the source PDF. When retrieval finds nothing, Picard refuses. No LLM call. No improvisation.

Repo: github.com/iamsaurabhc/picard-oss

Hosted sibling: picard.law (enterprise SaaS)

Current release: v0.2.0

You can run it from source, Docker, or a native installer. No Supabase. No Neo4j. No "trust our cloud with privilege."

Chapter 1: Your machine, your matter

Everything that matters stays under .picard-data/ on disk:

.picard-data/
├── picard.db          # chunks, FTS5, entities, chat, tabular
├── pdfs/              # raw PDF bytes
└── models/            # fastembed ONNX, optional GLiNER / Presidio

Parsing, OCR, indexes, and PDF storage are local. The only optional outbound traffic is your LLM provider (OpenAI, Anthropic, etc.) or fully local Ollama. Documents do not egress for search, indexing, or viewing.

liteparse extracts layout-aware chunks with normalized bounding boxes. Digital PDFs parse at 150 DPI. Scans route through local PaddleOCR (optional sidecar) or Tesseract at 300 DPI. Every citation downstream inherits spatial provenance from day one.

One command for developers:

git clone https://github.com/iamsaurabhc/picard-oss
cd picard-oss
cp .env.example backend/.env
./scripts/start.sh
# → http://localhost:3000

Or skip the terminal entirely. See Chapter 6.

Chapter 2: Evidence before eloquence

Picard inherits an evidence contract from production legal AI at picard.law. The contract is simple and ruthless:

Rule	Behavior
Citations assigned before synthesis	`[1]`, `[2]`, `[N]` map to real chunks with page + bbox before the LLM writes
Refuse gate on zero evidence	No retrieval → no LLM → honest refusal
Bbox-grounded UX	Click `[N]` → `MultiHighlightPDFViewer` highlights the precise region
Post-synthesis validation	Unsupported amounts, dates, and drift get stripped

Query → retrieve (FTS5 / CARP / hybrid) → refuse if empty
      → citation map [1..N] → LLM synthesize → stream → click [N] → bbox

Most RAG pipelines treat the LLM as the protagonist. Picard treats retrieval as the judge and the model as a clerk who may only speak from the record.

A refused answer is not a bug. It is the system doing what your partner wished the chatbot had done at 11:47 PM.

Chapter 3: Relevance beats similarity (and hybrid knows when to help)

Legal retrieval fails when vector search returns semantically similar but legally wrong text. A limitation of liability clause from Agreement B is not helpful when you asked about Agreement A.

Picard's core engine is SQLite FTS5 (BM25): exact phrase matching, explainable scores, sub-millisecond search, zero vector DB to provision.

For conjunctive questions ("party X + date Y + condition Z across 100K pages"), CARP (Constraint-Aware Retrieval Protocol) intersects entity constraints at the page level. No Neo4j cluster. No keyword soup. Auditable bundle formation with diagnostics on the Search page.

Hybrid search: local embeddings, FTS still wins

Picard also ships hybrid retrieval with a local ONNX embedding model (default: BAAI/bge-small-en-v1.5 via fastembed). Vectors live in SQLite as normalized float32 BLOBs. Optional sqlite-vec ANN on Python 3.13+.

The fusion is FTS-first weighted RRF, not "vectors replace keywords":

Strong FTS hits? Vectors stay in the bench.
Empty FTS pool? Vector fallback kicks in.
Mixed case? Weighted reciprocal rank fusion (w_fts=0.6, k=60) merges both signals.

# backend/.env
ENABLE_HYBRID_SEARCH=true

./scripts/start.sh                    # downloads ONNX model into .picard-data/models/fastembed
./scripts/backfill-embeddings.sh      # index existing PDFs
./scripts/backfill-embeddings.sh --vec-index   # page-level vectors

Embeddings never phone home. The model caches on disk. Ingest indexes vectors automatically when hybrid is on.

Design principle: relevance over similarity. Vectors bridge paraphrase gaps; FTS5 and CARP keep legal integrity.

Chapter 4: The PII airlock

Local-first storage does not mean you want client names and Aadhaar numbers riding along to OpenAI.

Picard ships a PII shield: detect locally, mask before cloud LLM calls, restore in the response stream.

Layer	What happens
Regex (always on)	Email, Indian phone, PAN, Aadhaar
Presidio (optional pack)	Names, locations, and richer entity types via spaCy
Ollama bypass	Fully local inference skips masking entirely
Chat UI	"PII shield" toggle in the chat header
Tabular	Server default protects cell extraction prompts

Placeholders look like <EMAIL_ADDRESS_1> or <PERSON_1>. The PIIProxy registers text per request; model_router anonymizes at the litellm boundary; StreamingPIIRestorer puts originals back before you see them.

Documents in your vault stay raw. Redaction is transit protection for cloud LLMs, not ingest erasure. Install the optional Presidio pack from Settings → Optional components.

For air-gapped or fully local Ollama deployments, the airlock doors stay open because nothing leaves the building anyway.

Chapter 5: The workbench (four surfaces, one contract)

Picard is not a single chat box. It is a workbench for legal document engineering.

Unified dashboard + Vault

The home surface (/) combines Ask and Review modes: attach documents, browse the Vault, stream answers, or spin up tabular reviews without context-switching. The Vault (/vault) is your matter file cabinet: upload, parse status, retry, scope documents into chat.

Citation chat

Streaming Q&A with session history, document scope, workflow intent pinning, and [N] pills wired to the PDF panel. Chat latency profiles (quality | balanced | fast) let you trade depth for time-to-first-token without forking the codebase.

The Citation Kernel (Phase 7.0, shipped) centralizes the evidence path: refuse → map → synthesize → validate → optional citation judge. Chat and agent corpus tools share the same kernel. No weaker "agent mode" shortcut.

Tabular review

Define columns in natural language. Picard runs FTS5 retrieval per cell, extracts structured JSON via LLM, and links every cell to source markers. SSE batch generation, flags, Excel export, and a review-side chat panel. Ten NDAs in one sitting is a design target, not a demo fantasy.

Workflow library

~18 built-in assistant and tabular playbooks ship as validated LightFlow flow_json DAGs. Browse, filter by deployment profile (firm/court), preview the step graph, validate, export JSON. Attach workflows in Chat to pin CARP intent. Seed tabular reviews from tabular workflows.

Run and full agent authoring await Phase 7b/7a, but the library is already a catalog of repeatable legal engineering patterns.

Deployment profiles

Firm and court profiles filter workflows, gate tools, and tune agent retrieval caps. Court mode blocks risk-scoring patterns and tightens connector defaults. Same evidence contract, different guardrails.

Chapter 6: Download it like normal software

Picard OSS is not "clone repo or nothing." v0.2.0 ships native binaries via Tauri, built in CI on every version tag:

Platform	Artifact	CI target
macOS Apple Silicon	`.dmg`	`darwin-aarch64`
macOS Intel	`.dmg`	`darwin-x86_64`
Windows 64-bit	`.exe`	`windows-x86_64`
Windows 32-bit	`.exe`	`windows-i686`
Linux (Ubuntu amd64)	`.deb`	`linux-x86_64`

Downloads publish to GitHub Releases. A manifest.json on gh-pages powers in-app updates (Tauri updater + Settings update check) and the picard.law download page.

macOS:   open DMG → Applications (see docs/MACOS_INSTALL.md for Gatekeeper)
Windows: run installer
Linux:   sudo dpkg -i Picard*.deb

Docker Compose and GHCR images remain available for teams who prefer containers:

docker compose up --build
# Optional OCR: docker compose --profile ocr up --build

Settings in the app (or the first-run onboarding wizard) stores API keys encrypted under your data directory. Keys never round-trip through the API in plaintext.

Chapter 7: Chester keeps us honest

Picard does not ship "vibes-based QA."

The Chester v. Municipality of Waverly corpus (627 chunks) anchors gold-label regression tests in CI. Metric families have stable IDs used in pytest, eval harnesses, and (roadmap) inline answer panels:

Family	What it guards
R	Snippet recall, precision, bbox coverage
C	CARP constraint extraction, page intersection, decoy rejection
F	Zero-evidence refuse rate, false refuses
CT	`[N]` marker resolution, pinpoint bbox accuracy
FG	Claim-level grounding, cross-bundle conflation
AB	Missed refusal, misleading answers

cd backend && source .venv/bin/activate
pytest -m corpus -q
./scripts/eval-search.sh
python scripts/eval_scorecard.py

Today, retrieval diagnostics appear inline in chat (RetrievalActivityPanel) and on the Search CARP debug panel. Full post-answer CT/FG/AB badges are on the roadmap.

Chapter 8: Stack for the curious

Layer	Choice
Frontend	Next.js 15, React 19, TypeScript, Shadcn UI
Backend	Python 3.11+, FastAPI, SQLAlchemy
Database	SQLite + FTS5 (WAL) + optional sqlite-vec
PDF	liteparse + react-pdf bbox overlay
LLM	litellm (OpenAI, Ollama, tiered SLM/LLM optional)
Embeddings	fastembed ONNX (`bge-small-en-v1.5`)
PII	Regex + optional Presidio/spaCy
Desktop	Tauri (DMG / EXE / DEB)
License	AGPL-3.0

Optional component packs (install from Settings): PaddleOCR, GLiNER NER, Presidio PII, agent scaffolding.

Chapter 9: Where Picard sits in the ecosystem

┌─────────────────────────────────────────────────────────────┐
│  Picard.law          Production SaaS · GraphRAG · Neo4j   │
└──────────────────────────────┬──────────────────────────────┘
                               │ evidence contract
                               ▼
┌─────────────────────────────────────────────────────────────┐
│  Picard OSS          Local-first · FTS5 + CARP · SQLite     │
│                      PII shield · hybrid · native binaries  │
└──────────────────────────────┬──────────────────────────────┘
                               │ tabular UX + DocPanel patterns
                               ▼
┌─────────────────────────────────────────────────────────────┐
│  Mike OSS            Cloud platform · Supabase · workflows  │
└─────────────────────────────────────────────────────────────┘

	Picard OSS	Picard.law	Mike OSS
Deployment	Your machine	Managed SaaS	Cloud
Retrieval	FTS5 + CARP + hybrid	GraphRAG	Vector + workflows
PII	Local shield for cloud LLM	Enterprise tiers	Supabase Auth
Binaries	Mac/Win/Linux	Hosted	Cloud
Best for	Legal engineers, air-gap, eval	Production	Full-stack platform
License	AGPL-3.0	Commercial	AGPL-3.0

Chapter 10: Shipped vs. loading

Shipped today (Phases 0-6 + 7.0):

PDF ingest, OCR, FTS5, CARP, hybrid search
Citation chat + Citation Kernel
Tabular review + Excel export
Workflow library (18 built-ins)
PII shield + optional Presidio
Settings, onboarding, encrypted secrets
Chat latency profiles, deployment profiles
Docker + native installers for 5 platforms
Vault, unified dashboard, chat history rail
Chester eval harness + PII e2e tests in CI

In development (honest roadmap):

LightFlow workflow execution (Phase 7b): Run button is wired but disabled until deterministic DAG runs land
Full LightAgent authoring loop (Phase 7a): kernel-first agent chat exists; multi-tool orchestration is scaffolded
Template drafts from guidelines + CSV (Phase 8)
Optional URL snapshots for web research, air-gap off by default (Phase 9)
Inline post-answer quality panel (CT/FG/AB badges)
WCAG gaps: canvas bbox screen reader exposure, streaming live regions

We would rather tell you what is loading than demo what is missing.

Chapter 11: Open source, open contract

Use case	License
Local dev, PoC, eval on your hardware	AGPL-3.0, no fee
Fork/redistribute modified versions	AGPL-3.0, source to users
Hosted production without AGPL obligations	Commercial license

Community:

Try it

Download a binary: github.com/iamsaurabhc/picard-oss/releases

Or from source:

git clone https://github.com/iamsaurabhc/picard-oss
cd picard-oss
./scripts/start.sh

Upload a PDF. Wait for parse_status=done. Ask a question. Click [1]. Watch the bbox light up.

If retrieval finds nothing, Picard will refuse. That is the point.

Picard OSS is built by legal engineers who have watched too many models confidently cite the wrong page. Star the repo, run the Chester eval, file an issue when CARP misfires. Evidence before eloquence. Always.

Suggested dev.to tags: #opensource #legaltech #rag #privacy #localfirst #python #nextjs #sqlite #ai #citations

Top comments (4)

Gunjan Tailor • Jun 8

"Retrieval as the judge, model as a clerk who may only speak from the record" — that's the whole ballgame, and the refuse-on-zero-evidence gate is underrated. We see the same thing on financial docs: confident-but-wrong answers almost never trace to the model inventing numbers, they trace to retrieval handing it the wrong row. Kindred design choices here (local-first, SQLite FTS5, BM25-first hybrid) — I went the same way with docnest's ingestion engine. One real question: how does CARP handle constraints whose meaning depends on a column header several rows up? Cross-cell table semantics is the part that's broken every parser I've thrown at it.

Saurabh C. • Jun 8 • Edited

"Retrieval as judge, model as clerk" is exactly the design bet. And yes: on financial docs the failure mode is almost always the wrong row, not hallucinated numbers. Refuse-on-zero-evidence is underrated until you've watched a partner click through a wrong citation at midnight.

Your CARP question is the one we don't hand-wave: column-header semantics several rows up are not first-class yet.

CARP does constraint intersection on page_entities (+ optional section_key from doc headings). It does not understand that $4.2M in row 14 means "Q3 Revenue" because of a header three rows above. If liteparse keeps that in one table chunk, FTS/hybrid usually saves you. If not, page-level intersection can bundle unrelated co-mentions. We see it on dense financial schedules.

Mitigation today: tabular review paths column semantics through the extraction prompt per cell. Longer term: structured table ingest (row/col + header propagation) so retrieval can refuse when header context is missing, not just when the page is empty.

Docnest sounds like kindred spirits. If you're open to it, I'd be curious what your ingestion engine does for header propagation. That's the layer I'd want to steal if someone has it working.

Gunjan Tailor • Jun 15

Header propagation is the core thing DocNest is built around. Every table gets stored as {caption, headers, rows[]} — so each row carries its column names at rest, not just at parse time. Headings above the table become a section_id attached to the table node, so retrieval always gets "Q3 Revenue | $4.2M" not just "$4.2M." The rule is simple: if structure was visible to a human reader, it has to survive ingestion as a first-class field, not disappear into a flat string. Happy to walk through the schema if useful — repo is github.com/tailorgunjan93/docnest.

Some comments may only be visible to logged-in visitors. Sign in to view all comments.