DEV Community

Trinh Tran Khanh Duy
Trinh Tran Khanh Duy

Posted on

Building a privacy-first document processor with Ollama + Gradio

A step-by-step guide to building a local AI document processor that makes zero external network calls — useful for processing NDA-bound contracts, confidential reports, or any document you can't upload to ChatGPT.

Architecture overview

PDF/DOCX file
    ↓
pdfplumber / python-docx (text extraction)
    ↓
System prompt + document text
    ↓
Ollama API (localhost:11434)
    ↓
Gradio UI (localhost:7860)
    ↓
Summary / Q&A / entities
Enter fullscreen mode Exit fullscreen mode

Everything runs on localhost. Zero cloud dependencies at runtime.

Prerequisites

  • Python 3.11+
  • Ollama installed and running
  • 8GB+ RAM (16GB recommended)
# Install Ollama (Windows)
winget install Ollama.Ollama

# Pull a model
ollama pull llama3.1:8b
Enter fullscreen mode Exit fullscreen mode

Core dependencies

pip install gradio pdfplumber python-docx requests
Enter fullscreen mode Exit fullscreen mode

Step 1: Text extraction

import pdfplumber
import docx
from pathlib import Path

def extract_text(file_path: str) -> str:
    path = Path(file_path)
    if path.suffix.lower() == ".pdf":
        with pdfplumber.open(file_path) as pdf:
            return "\n\n".join(
                page.extract_text() or "" for page in pdf.pages
            )
    elif path.suffix.lower() in (".docx", ".doc"):
        doc = docx.Document(file_path)
        return "\n".join(p.text for p in doc.paragraphs if p.text.strip())
    raise ValueError(f"Unsupported file type: {path.suffix}")
Enter fullscreen mode Exit fullscreen mode

Step 2: Ollama integration

import requests

OLLAMA_URL = "http://localhost:11434/api/generate"

def query_ollama(prompt: str, model: str = "llama3.1:8b") -> str:
    response = requests.post(OLLAMA_URL, json={
        "model": model,
        "prompt": prompt,
        "stream": False,
    }, timeout=120)
    response.raise_for_status()
    return response.json()["response"]
Enter fullscreen mode Exit fullscreen mode

Note: http://localhost:11434 — not a cloud API. No authentication needed.

Step 3: Domain-specific system prompts

Generic prompts give generic results. Tuned prompts for document types:

DOMAIN_PROMPTS = {
    "legal": (
        "You are a legal document analyst. Extract and structure the following "
        "from the document:\n"
        "1. PARTIES: All named parties and their roles\n"
        "2. KEY DATES: Effective date, termination, deadlines\n"
        "3. OBLIGATIONS: Each party's obligations\n"
        "4. PAYMENT TERMS: Amounts, schedules, conditions\n"
        "5. UNUSUAL CLAUSES: Non-standard or notable provisions\n"
        "6. GOVERNING LAW: Jurisdiction and dispute resolution\n"
        "Be factual and precise. Do not interpret or give legal advice."
    ),
    "financial": (
        "You are a financial document analyst. Extract:\n"
        "1. AMOUNTS: All monetary values with context\n"
        "2. DATES: Payment dates, fiscal periods, deadlines\n"
        "3. PARTIES: Vendors, clients, counterparties\n"
        "4. TERMS: Payment terms, penalties, conditions\n"
        "5. KEY METRICS: Revenue, costs, margins if present"
    ),
}

def process_document(file_path: str, domain: str, model: str) -> str:
    text = extract_text(file_path)
    system = DOMAIN_PROMPTS.get(domain, "Summarize the key points of this document.")
    prompt = f"{system}\n\nDOCUMENT:\n{text[:12000]}"  # ~12k char limit
    return query_ollama(prompt, model)
Enter fullscreen mode Exit fullscreen mode

Step 4: Privacy-safe Gradio UI

import gradio as gr

def build_ui():
    with gr.Blocks(title="Local Document Processor") as app:
        gr.Markdown("## Local Document Processor\n*All processing on your machine — no cloud*")

        with gr.Row():
            file_input = gr.File(label="Upload PDF or DOCX", file_types=[".pdf", ".docx"])
            domain = gr.Dropdown(
                choices=list(DOMAIN_PROMPTS.keys()),
                value="legal",
                label="Domain"
            )

        process_btn = gr.Button("Process Document", variant="primary")
        output = gr.Textbox(label="Result", lines=20)

        process_btn.click(
            fn=lambda f, d: process_document(f.name, d, "llama3.1:8b"),
            inputs=[file_input, domain],
            outputs=output,
        )

    return app

if __name__ == "__main__":
    app = build_ui()
    app.launch(
        server_name="127.0.0.1",   # localhost only
        share=False,                # no Gradio tunnel
        analytics_enabled=False,    # no phone-home
    )
Enter fullscreen mode Exit fullscreen mode

Step 5: Batch processing

For processing entire folders:

import zipfile
import tempfile
from pathlib import Path

def batch_process(folder_path: str, domain: str, model: str) -> str:
    results = {}
    for file in Path(folder_path).glob("*"):
        if file.suffix.lower() in (".pdf", ".docx"):
            try:
                results[file.name] = process_document(str(file), domain, model)
            except Exception as e:
                results[file.name] = f"ERROR: {e}"

    # Package results as ZIP
    with tempfile.NamedTemporaryFile(suffix=".zip", delete=False) as tmp:
        with zipfile.ZipFile(tmp.name, "w") as zf:
            for filename, content in results.items():
                zf.writestr(f"{filename}.txt", content)
        return tmp.name
Enter fullscreen mode Exit fullscreen mode

Performance tips

  • Context window: Truncate documents to ~12,000 characters for reliable results with 8b models
  • Temperature: Set "temperature": 0.1 for factual extraction (less hallucination)
  • Streaming: Use "stream": True for better UX on long documents — update UI in real-time
  • Model selection: qwen2.5:3b for speed, llama3.1:8b for quality, llama3.1:70b for accuracy

Verification

Run Wireshark filtered to not host 127.0.0.1 while processing a document. You should see zero packets — confirming no data leaves your machine.

Full product

The complete version (batch mode, 10 domain types, hardware detection, Windows installer, 12 use-case recipes) is available at https://journeyer376.gumroad.com/l/ussytd for $39.

The architecture above is the core of what it does — the product adds packaging, documentation, and domain prompt iteration aimed at non-developers.


Questions about the architecture or model benchmarks? Happy to answer in the comments.

Top comments (0)