DEV Community

Cover image for I built a 3B lease risk scanner that runs without an external LLM API
Adam
Adam Subscriber

Posted on

I built a 3B lease risk scanner that runs without an external LLM API

I built Lease Lens for the Hugging Face Build Small Hackathon.

The idea is simple: most people sign contracts they do not really read.

That is true for apartment leases, freelance agreements, gym memberships, SaaS terms, and small-business office leases. The risk is not that every contract is malicious. The risk is that a normal person can miss a renewal clause, late-fee stack, deposit condition, indemnity clause, repair burden, or arbitration waiver until it is too late.

Lease Lens is a small-model contract review assistant. It reads a lease or contract, finds risky clauses, quotes the exact language, highlights it in the source text, scores the contract, and drafts a plain-English negotiation email.

Demo: https://youtu.be/M-v3OAKO5-k

Space: https://huggingface.co/spaces/build-small-hackathon/lease-lens

GitHub: https://github.com/bO-05/lease-lens

Model: https://huggingface.co/giladam01/lease-lens-legal-3b

GGUF: https://huggingface.co/giladam01/lease-lens-legal-3b-gguf

Why build this with a small model?

For this problem, the small-model constraint is not just a hackathon rule. It is part of the product.

Contracts can contain private addresses, payments, business terms, and personal details. A user should not have to send that text to a closed external LLM API just to understand whether a lease contains obvious risk.

Lease Lens runs the model inside the Hugging Face Space and also ships a GGUF build for local llama.cpp / Ollama usage. The app does not call an external LLM API.

That gives the project a clear target:

  • useful enough for a first-pass contract risk read
  • small enough to run in constrained environments
  • grounded enough to show exact evidence
  • honest enough to say what it did not check

What it does

The app checks for common contract risk categories:

  • automatic renewal
  • early termination penalties
  • rent or price increases
  • late fees and penalties
  • deposit or prepayment terms
  • non-compete or exclusivity language
  • IP assignment
  • liability and indemnification
  • maintenance or repair burden
  • arbitration, jury waiver, class-action waiver, or governing law

For every accepted flag, Lease Lens shows:

  • the clause category
  • a risk level
  • the exact quoted text
  • why it matters
  • a plain-English pushback suggestion
  • a highlight in the original contract text

Then it can draft a negotiation email from the grounded flags.

It is not legal advice. It is a review assistant: evidence first, user judgment second.

The model

The shipped model is a fine-tuned Llama 3.2 3B legal extraction model.

I fine-tuned on CUAD-style legal clause extraction and evaluated on 100 held-out CUAD extraction items with the same setup across models.

The headline result:

Model F1 Exact match
Llama 3.2 3B base 0.119 0.010
Lease Lens 3B 0.406 0.280
Llama 3.1 8B base 0.206 0.020
my 8B fine-tune 0.357 0.230

The 3B fine-tune improved F1 by about +242% relative over the base 3B model and even beat my own 8B fine-tune on the same held-out items.

That is the part I like most about the project: small did not mean worse by default. For a specific extraction task, a tuned 3B model was enough to become useful.

Grounding matters more than sounding smart

The first version had an important failure mode: when trained mostly on positive examples, the bare model over-extracted on absent clause types. In other words, it was too eager to find something.

So the app does not trust generation alone.

Lease Lens wraps the model with deterministic guards:

  1. The quote must appear verbatim in the contract.
  2. Duplicate or near-duplicate quotes are removed.
  3. The quote must contain terms relevant to the clause category.
  4. The UI declares coverage: how many clause groups were checked and how much text was read.

For long contracts, the app reads the first 80k characters, splits the text into overlapping windows, routes each clause category only to windows containing relevant keywords, and runs the checks as a batched generation call.

This makes the output less magical, but much more inspectable. A user can look at the quote, look at the highlighted source text, and decide whether it matters.

Real contracts, not just toy examples

The Space includes real executed commercial leases from SEC EDGAR filings.

That matters because benchmark scores are not enough. A demo can look good on short synthetic examples and then fall apart on actual legal documents.

The built-in examples include:

  • an office lease from Alpharetta, GA
  • an office lease amendment from Boston, MA
  • a long office lease from Addison, TX

The Boston example is a good quick demo: Lease Lens finds 3 grounded flags and catches the exact $125,301.33 security-deposit clause.

The Addison example is a stress test: long text, partial coverage, and enough complexity to show why the UI needs to be evidence-first instead of just a chatbot answer.

The UI

I started with a Gradio app, but the final submission needed to feel less like a stock demo and more like a focused tool.

The current UI is a "redline legal evidence desk":

  • a default real SEC lease is loaded on open
  • a clear three-step path: load filing, analyze, draft pushback
  • an explicit analyzing indicator before the GPU call starts
  • a risk docket with a score seal
  • grouped evidence cards
  • highlighted contract text beside the flags
  • a negotiation letter panel

The goal is that a judge can understand the whole product path in under a minute:

  1. Open the Space.
  2. Press Analyze contract.
  3. See the risk docket.
  4. Verify a highlighted quote.
  5. Draft a pushback email.

Modal, GGUF, and Codex

I used Modal for the v2.5 training path and smoke verification.

The smoke run used an A100-40GB, loaded a CUAD smoke split of 400 positives and 100 synthesized NONE examples, trained for 60 steps, and completed cleanly in about 160 seconds. I kept the run as --no-push evidence so it verified the Modal path without overwriting the published model.

The repo also includes the training script:

https://github.com/bO-05/lease-lens/blob/main/training/finetune_legal_3b_modal_v2.py

For local usage, I published a GGUF build:

ollama pull hf.co/giladam01/lease-lens-legal-3b-gguf
Enter fullscreen mode Exit fullscreen mode

I also built/finalized the submission with OpenAI Codex as my coding agent. The public GitHub history contains Codex-attributed commits, and the repo includes a Codex build log:

https://github.com/bO-05/lease-lens/blob/main/docs/codex-build-log.md

What I would improve next

There are still obvious next steps:

  • better PDF ingestion
  • stronger abstention behavior in the shipped model
  • more held-out real-world contracts
  • a clause-specific calibration layer
  • clearer user controls for risk tolerance
  • richer offline packaging around the GGUF path

The big lesson for me was that a small legal model can be useful if the product does not ask it to be a lawyer.

Ask it to extract. Ground the quote. Highlight the evidence. Show the limitation. Let the human decide.

That is the shape of Lease Lens.

Demo: https://youtu.be/M-v3OAKO5-k

Live Space: https://huggingface.co/spaces/build-small-hackathon/lease-lens

GitHub: https://github.com/bO-05/lease-lens

Model: https://huggingface.co/giladam01/lease-lens-legal-3b

Field notes: https://huggingface.co/blog/giladam01/lease-lens-article

Top comments (0)