A step-by-step guide to building a local AI document processor that makes zero external network calls — useful for processing NDA-bound contracts, confidential reports, or any document you can't upload to ChatGPT.
Architecture overview
PDF/DOCX file
↓
pdfplumber / python-docx (text extraction)
↓
System prompt + document text
↓
Ollama API (localhost:11434)
↓
Gradio UI (localhost:7860)
↓
Summary / Q&A / entities
Everything runs on localhost. Zero cloud dependencies at runtime.
Prerequisites
- Python 3.11+
- Ollama installed and running
- 8GB+ RAM (16GB recommended)
# Install Ollama (Windows)
winget install Ollama.Ollama
# Pull a model
ollama pull llama3.1:8b
Core dependencies
pip install gradio pdfplumber python-docx requests
Step 1: Text extraction
import pdfplumber
import docx
from pathlib import Path
def extract_text(file_path: str) -> str:
path = Path(file_path)
if path.suffix.lower() == ".pdf":
with pdfplumber.open(file_path) as pdf:
return "\n\n".join(
page.extract_text() or "" for page in pdf.pages
)
elif path.suffix.lower() in (".docx", ".doc"):
doc = docx.Document(file_path)
return "\n".join(p.text for p in doc.paragraphs if p.text.strip())
raise ValueError(f"Unsupported file type: {path.suffix}")
Step 2: Ollama integration
import requests
OLLAMA_URL = "http://localhost:11434/api/generate"
def query_ollama(prompt: str, model: str = "llama3.1:8b") -> str:
response = requests.post(OLLAMA_URL, json={
"model": model,
"prompt": prompt,
"stream": False,
}, timeout=120)
response.raise_for_status()
return response.json()["response"]
Note: http://localhost:11434 — not a cloud API. No authentication needed.
Step 3: Domain-specific system prompts
Generic prompts give generic results. Tuned prompts for document types:
DOMAIN_PROMPTS = {
"legal": (
"You are a legal document analyst. Extract and structure the following "
"from the document:\n"
"1. PARTIES: All named parties and their roles\n"
"2. KEY DATES: Effective date, termination, deadlines\n"
"3. OBLIGATIONS: Each party's obligations\n"
"4. PAYMENT TERMS: Amounts, schedules, conditions\n"
"5. UNUSUAL CLAUSES: Non-standard or notable provisions\n"
"6. GOVERNING LAW: Jurisdiction and dispute resolution\n"
"Be factual and precise. Do not interpret or give legal advice."
),
"financial": (
"You are a financial document analyst. Extract:\n"
"1. AMOUNTS: All monetary values with context\n"
"2. DATES: Payment dates, fiscal periods, deadlines\n"
"3. PARTIES: Vendors, clients, counterparties\n"
"4. TERMS: Payment terms, penalties, conditions\n"
"5. KEY METRICS: Revenue, costs, margins if present"
),
}
def process_document(file_path: str, domain: str, model: str) -> str:
text = extract_text(file_path)
system = DOMAIN_PROMPTS.get(domain, "Summarize the key points of this document.")
prompt = f"{system}\n\nDOCUMENT:\n{text[:12000]}" # ~12k char limit
return query_ollama(prompt, model)
Step 4: Privacy-safe Gradio UI
import gradio as gr
def build_ui():
with gr.Blocks(title="Local Document Processor") as app:
gr.Markdown("## Local Document Processor\n*All processing on your machine — no cloud*")
with gr.Row():
file_input = gr.File(label="Upload PDF or DOCX", file_types=[".pdf", ".docx"])
domain = gr.Dropdown(
choices=list(DOMAIN_PROMPTS.keys()),
value="legal",
label="Domain"
)
process_btn = gr.Button("Process Document", variant="primary")
output = gr.Textbox(label="Result", lines=20)
process_btn.click(
fn=lambda f, d: process_document(f.name, d, "llama3.1:8b"),
inputs=[file_input, domain],
outputs=output,
)
return app
if __name__ == "__main__":
app = build_ui()
app.launch(
server_name="127.0.0.1", # localhost only
share=False, # no Gradio tunnel
analytics_enabled=False, # no phone-home
)
Step 5: Batch processing
For processing entire folders:
import zipfile
import tempfile
from pathlib import Path
def batch_process(folder_path: str, domain: str, model: str) -> str:
results = {}
for file in Path(folder_path).glob("*"):
if file.suffix.lower() in (".pdf", ".docx"):
try:
results[file.name] = process_document(str(file), domain, model)
except Exception as e:
results[file.name] = f"ERROR: {e}"
# Package results as ZIP
with tempfile.NamedTemporaryFile(suffix=".zip", delete=False) as tmp:
with zipfile.ZipFile(tmp.name, "w") as zf:
for filename, content in results.items():
zf.writestr(f"{filename}.txt", content)
return tmp.name
Performance tips
- Context window: Truncate documents to ~12,000 characters for reliable results with 8b models
-
Temperature: Set
"temperature": 0.1for factual extraction (less hallucination) -
Streaming: Use
"stream": Truefor better UX on long documents — update UI in real-time - Model selection: qwen2.5:3b for speed, llama3.1:8b for quality, llama3.1:70b for accuracy
Verification
Run Wireshark filtered to not host 127.0.0.1 while processing a document. You should see zero packets — confirming no data leaves your machine.
Full product
The complete version (batch mode, 10 domain types, hardware detection, Windows installer, 12 use-case recipes) is available at https://journeyer376.gumroad.com/l/ussytd for $39.
The architecture above is the core of what it does — the product adds packaging, documentation, and domain prompt iteration aimed at non-developers.
Questions about the architecture or model benchmarks? Happy to answer in the comments.
Top comments (0)