DEV Community

Cover image for Picard OSS: Legal AI That Lives on Your Machine, Refuses to Bluff, and Ships Binaries
Saurabh C.
Saurabh C.

Posted on

Picard OSS: Legal AI That Lives on Your Machine, Refuses to Bluff, and Ships Binaries

A field manual for legal engineers who have been burned by confident wrong answers.


Three things most legal AI products get backwards:

  1. Documents leave your machine before you get an answer.
  2. The model talks first, citations get stapled on later.
  3. "Page 47" counts as verification.

Picard OSS flips all three.

It is an open-source, local-first legal document assistant: upload PDFs, search with BM25 or multi-constraint CARP, chat with citation-grade answers, run tabular extraction across a matter, and click any [N] marker to jump to the exact bounding box on the source PDF. When retrieval finds nothing, Picard refuses. No LLM call. No improvisation.

Repo: github.com/iamsaurabhc/picard-oss

Hosted sibling: picard.law (enterprise SaaS)

Current release: v0.2.0

You can run it from source, Docker, or a native installer. No Supabase. No Neo4j. No "trust our cloud with privilege."


Chapter 1: Your machine, your matter

Everything that matters stays under .picard-data/ on disk:

.picard-data/
├── picard.db          # chunks, FTS5, entities, chat, tabular
├── pdfs/              # raw PDF bytes
└── models/            # fastembed ONNX, optional GLiNER / Presidio
Enter fullscreen mode Exit fullscreen mode

Parsing, OCR, indexes, and PDF storage are local. The only optional outbound traffic is your LLM provider (OpenAI, Anthropic, etc.) or fully local Ollama. Documents do not egress for search, indexing, or viewing.

liteparse extracts layout-aware chunks with normalized bounding boxes. Digital PDFs parse at 150 DPI. Scans route through local PaddleOCR (optional sidecar) or Tesseract at 300 DPI. Every citation downstream inherits spatial provenance from day one.

One command for developers:

git clone https://github.com/iamsaurabhc/picard-oss
cd picard-oss
cp .env.example backend/.env
./scripts/start.sh
# → http://localhost:3000
Enter fullscreen mode Exit fullscreen mode

Or skip the terminal entirely. See Chapter 6.


Chapter 2: Evidence before eloquence

Picard inherits an evidence contract from production legal AI at picard.law. The contract is simple and ruthless:

Rule Behavior
Citations assigned before synthesis [1], [2], [N] map to real chunks with page + bbox before the LLM writes
Refuse gate on zero evidence No retrieval → no LLM → honest refusal
Bbox-grounded UX Click [N]MultiHighlightPDFViewer highlights the precise region
Post-synthesis validation Unsupported amounts, dates, and drift get stripped
Query → retrieve (FTS5 / CARP / hybrid) → refuse if empty
      → citation map [1..N] → LLM synthesize → stream → click [N] → bbox
Enter fullscreen mode Exit fullscreen mode

Most RAG pipelines treat the LLM as the protagonist. Picard treats retrieval as the judge and the model as a clerk who may only speak from the record.

A refused answer is not a bug. It is the system doing what your partner wished the chatbot had done at 11:47 PM.


Chapter 3: Relevance beats similarity (and hybrid knows when to help)

Legal retrieval fails when vector search returns semantically similar but legally wrong text. A limitation of liability clause from Agreement B is not helpful when you asked about Agreement A.

Picard's core engine is SQLite FTS5 (BM25): exact phrase matching, explainable scores, sub-millisecond search, zero vector DB to provision.

For conjunctive questions ("party X + date Y + condition Z across 100K pages"), CARP (Constraint-Aware Retrieval Protocol) intersects entity constraints at the page level. No Neo4j cluster. No keyword soup. Auditable bundle formation with diagnostics on the Search page.

Hybrid search: local embeddings, FTS still wins

Picard also ships hybrid retrieval with a local ONNX embedding model (default: BAAI/bge-small-en-v1.5 via fastembed). Vectors live in SQLite as normalized float32 BLOBs. Optional sqlite-vec ANN on Python 3.13+.

The fusion is FTS-first weighted RRF, not "vectors replace keywords":

  • Strong FTS hits? Vectors stay in the bench.
  • Empty FTS pool? Vector fallback kicks in.
  • Mixed case? Weighted reciprocal rank fusion (w_fts=0.6, k=60) merges both signals.
# backend/.env
ENABLE_HYBRID_SEARCH=true

./scripts/start.sh                    # downloads ONNX model into .picard-data/models/fastembed
./scripts/backfill-embeddings.sh      # index existing PDFs
./scripts/backfill-embeddings.sh --vec-index   # page-level vectors
Enter fullscreen mode Exit fullscreen mode

Embeddings never phone home. The model caches on disk. Ingest indexes vectors automatically when hybrid is on.

Design principle: relevance over similarity. Vectors bridge paraphrase gaps; FTS5 and CARP keep legal integrity.


Chapter 4: The PII airlock

Local-first storage does not mean you want client names and Aadhaar numbers riding along to OpenAI.

Picard ships a PII shield: detect locally, mask before cloud LLM calls, restore in the response stream.

Layer What happens
Regex (always on) Email, Indian phone, PAN, Aadhaar
Presidio (optional pack) Names, locations, and richer entity types via spaCy
Ollama bypass Fully local inference skips masking entirely
Chat UI "PII shield" toggle in the chat header
Tabular Server default protects cell extraction prompts

Placeholders look like <EMAIL_ADDRESS_1> or <PERSON_1>. The PIIProxy registers text per request; model_router anonymizes at the litellm boundary; StreamingPIIRestorer puts originals back before you see them.

Documents in your vault stay raw. Redaction is transit protection for cloud LLMs, not ingest erasure. Install the optional Presidio pack from Settings → Optional components.

For air-gapped or fully local Ollama deployments, the airlock doors stay open because nothing leaves the building anyway.


Chapter 5: The workbench (four surfaces, one contract)

Picard is not a single chat box. It is a workbench for legal document engineering.

Unified dashboard + Vault

The home surface (/) combines Ask and Review modes: attach documents, browse the Vault, stream answers, or spin up tabular reviews without context-switching. The Vault (/vault) is your matter file cabinet: upload, parse status, retry, scope documents into chat.

Citation chat

Streaming Q&A with session history, document scope, workflow intent pinning, and [N] pills wired to the PDF panel. Chat latency profiles (quality | balanced | fast) let you trade depth for time-to-first-token without forking the codebase.

The Citation Kernel (Phase 7.0, shipped) centralizes the evidence path: refuse → map → synthesize → validate → optional citation judge. Chat and agent corpus tools share the same kernel. No weaker "agent mode" shortcut.

Tabular review

Define columns in natural language. Picard runs FTS5 retrieval per cell, extracts structured JSON via LLM, and links every cell to source markers. SSE batch generation, flags, Excel export, and a review-side chat panel. Ten NDAs in one sitting is a design target, not a demo fantasy.

Workflow library

~18 built-in assistant and tabular playbooks ship as validated LightFlow flow_json DAGs. Browse, filter by deployment profile (firm/court), preview the step graph, validate, export JSON. Attach workflows in Chat to pin CARP intent. Seed tabular reviews from tabular workflows.

Run and full agent authoring await Phase 7b/7a, but the library is already a catalog of repeatable legal engineering patterns.

Deployment profiles

Firm and court profiles filter workflows, gate tools, and tune agent retrieval caps. Court mode blocks risk-scoring patterns and tightens connector defaults. Same evidence contract, different guardrails.


Chapter 6: Download it like normal software

Picard OSS is not "clone repo or nothing." v0.2.0 ships native binaries via Tauri, built in CI on every version tag:

Platform Artifact CI target
macOS Apple Silicon .dmg darwin-aarch64
macOS Intel .dmg darwin-x86_64
Windows 64-bit .exe windows-x86_64
Windows 32-bit .exe windows-i686
Linux (Ubuntu amd64) .deb linux-x86_64

Downloads publish to GitHub Releases. A manifest.json on gh-pages powers in-app updates (Tauri updater + Settings update check) and the picard.law download page.

macOS:   open DMG → Applications (see docs/MACOS_INSTALL.md for Gatekeeper)
Windows: run installer
Linux:   sudo dpkg -i Picard*.deb
Enter fullscreen mode Exit fullscreen mode

Docker Compose and GHCR images remain available for teams who prefer containers:

docker compose up --build
# Optional OCR: docker compose --profile ocr up --build
Enter fullscreen mode Exit fullscreen mode

Settings in the app (or the first-run onboarding wizard) stores API keys encrypted under your data directory. Keys never round-trip through the API in plaintext.


Chapter 7: Chester keeps us honest

Picard does not ship "vibes-based QA."

The Chester v. Municipality of Waverly corpus (627 chunks) anchors gold-label regression tests in CI. Metric families have stable IDs used in pytest, eval harnesses, and (roadmap) inline answer panels:

Family What it guards
R Snippet recall, precision, bbox coverage
C CARP constraint extraction, page intersection, decoy rejection
F Zero-evidence refuse rate, false refuses
CT [N] marker resolution, pinpoint bbox accuracy
FG Claim-level grounding, cross-bundle conflation
AB Missed refusal, misleading answers
cd backend && source .venv/bin/activate
pytest -m corpus -q
./scripts/eval-search.sh
python scripts/eval_scorecard.py
Enter fullscreen mode Exit fullscreen mode

Today, retrieval diagnostics appear inline in chat (RetrievalActivityPanel) and on the Search CARP debug panel. Full post-answer CT/FG/AB badges are on the roadmap.


Chapter 8: Stack for the curious

Layer Choice
Frontend Next.js 15, React 19, TypeScript, Shadcn UI
Backend Python 3.11+, FastAPI, SQLAlchemy
Database SQLite + FTS5 (WAL) + optional sqlite-vec
PDF liteparse + react-pdf bbox overlay
LLM litellm (OpenAI, Ollama, tiered SLM/LLM optional)
Embeddings fastembed ONNX (bge-small-en-v1.5)
PII Regex + optional Presidio/spaCy
Desktop Tauri (DMG / EXE / DEB)
License AGPL-3.0

Optional component packs (install from Settings): PaddleOCR, GLiNER NER, Presidio PII, agent scaffolding.


Chapter 9: Where Picard sits in the ecosystem

┌─────────────────────────────────────────────────────────────┐
│  Picard.law          Production SaaS · GraphRAG · Neo4j   │
└──────────────────────────────┬──────────────────────────────┘
                               │ evidence contract
                               ▼
┌─────────────────────────────────────────────────────────────┐
│  Picard OSS          Local-first · FTS5 + CARP · SQLite     │
│                      PII shield · hybrid · native binaries  │
└──────────────────────────────┬──────────────────────────────┘
                               │ tabular UX + DocPanel patterns
                               ▼
┌─────────────────────────────────────────────────────────────┐
│  Mike OSS            Cloud platform · Supabase · workflows  │
└─────────────────────────────────────────────────────────────┘
Enter fullscreen mode Exit fullscreen mode
Picard OSS Picard.law Mike OSS
Deployment Your machine Managed SaaS Cloud
Retrieval FTS5 + CARP + hybrid GraphRAG Vector + workflows
PII Local shield for cloud LLM Enterprise tiers Supabase Auth
Binaries Mac/Win/Linux Hosted Cloud
Best for Legal engineers, air-gap, eval Production Full-stack platform
License AGPL-3.0 Commercial AGPL-3.0

Chapter 10: Shipped vs. loading

Shipped today (Phases 0-6 + 7.0):

  • PDF ingest, OCR, FTS5, CARP, hybrid search
  • Citation chat + Citation Kernel
  • Tabular review + Excel export
  • Workflow library (18 built-ins)
  • PII shield + optional Presidio
  • Settings, onboarding, encrypted secrets
  • Chat latency profiles, deployment profiles
  • Docker + native installers for 5 platforms
  • Vault, unified dashboard, chat history rail
  • Chester eval harness + PII e2e tests in CI

In development (honest roadmap):

  • LightFlow workflow execution (Phase 7b): Run button is wired but disabled until deterministic DAG runs land
  • Full LightAgent authoring loop (Phase 7a): kernel-first agent chat exists; multi-tool orchestration is scaffolded
  • Template drafts from guidelines + CSV (Phase 8)
  • Optional URL snapshots for web research, air-gap off by default (Phase 9)
  • Inline post-answer quality panel (CT/FG/AB badges)
  • WCAG gaps: canvas bbox screen reader exposure, streaming live regions

We would rather tell you what is loading than demo what is missing.


Chapter 11: Open source, open contract

Use case License
Local dev, PoC, eval on your hardware AGPL-3.0, no fee
Fork/redistribute modified versions AGPL-3.0, source to users
Hosted production without AGPL obligations Commercial license

Community:


Try it

Download a binary: github.com/iamsaurabhc/picard-oss/releases

Or from source:

git clone https://github.com/iamsaurabhc/picard-oss
cd picard-oss
./scripts/start.sh
Enter fullscreen mode Exit fullscreen mode

Upload a PDF. Wait for parse_status=done. Ask a question. Click [1]. Watch the bbox light up.

If retrieval finds nothing, Picard will refuse. That is the point.


Picard OSS is built by legal engineers who have watched too many models confidently cite the wrong page. Star the repo, run the Chester eval, file an issue when CARP misfires. Evidence before eloquence. Always.


Suggested dev.to tags: #opensource #legaltech #rag #privacy #localfirst #python #nextjs #sqlite #ai #citations

Top comments (2)

Collapse
 
gunjantailor profile image
Gunjan Tailor

"Retrieval as the judge, model as a clerk who may only speak from the record" — that's the whole ballgame, and the refuse-on-zero-evidence gate is underrated. We see the same thing on financial docs: confident-but-wrong answers almost never trace to the model inventing numbers, they trace to retrieval handing it the wrong row. Kindred design choices here (local-first, SQLite FTS5, BM25-first hybrid) — I went the same way with docnest's ingestion engine. One real question: how does CARP handle constraints whose meaning depends on a column header several rows up? Cross-cell table semantics is the part that's broken every parser I've thrown at it.

Collapse
 
iamsaurabhc profile image
Saurabh C. • Edited

"Retrieval as judge, model as clerk" is exactly the design bet. And yes: on financial docs the failure mode is almost always the wrong row, not hallucinated numbers. Refuse-on-zero-evidence is underrated until you've watched a partner click through a wrong citation at midnight.

Your CARP question is the one we don't hand-wave: column-header semantics several rows up are not first-class yet.

CARP does constraint intersection on page_entities (+ optional section_key from doc headings). It does not understand that $4.2M in row 14 means "Q3 Revenue" because of a header three rows above. If liteparse keeps that in one table chunk, FTS/hybrid usually saves you. If not, page-level intersection can bundle unrelated co-mentions. We see it on dense financial schedules.

Mitigation today: tabular review paths column semantics through the extraction prompt per cell. Longer term: structured table ingest (row/col + header propagation) so retrieval can refuse when header context is missing, not just when the page is empty.

Docnest sounds like kindred spirits. If you're open to it, I'd be curious what your ingestion engine does for header propagation. That's the layer I'd want to steal if someone has it working.