Fabio Hiroki

Posted on Dec 9 • Originally published at Medium

Build a Free, Private “Chat with PDF” App in 70 Lines of Python

#programming #python #ai #tutorial

You have a private PDF you need to talk to.

But you can’t just upload it to ChatGPT. It might contain sensitive client data, internal business plans, or personal financial information.

So you search for a local solution, but every tutorial drags you through a maze of broken dependencies and cryptic error messages. It’s frustrating. It feels like it should be simple.

This guide is different. It gives you a complete, working setup using free, open-source tools that respect your privacy.

We’ll use Ollama for the brain, LanceDB for the memory, and a specific set of libraries that are guaranteed to play nicely together.

Why This Stack? Privacy, Power, and Zero Cost.

The goal is total control and zero cost.

This architecture is built on two core components that run entirely on your machine.

Ollama lets you run powerful large language models (like Llama 3 or Mistral) locally. It’s your offline brain. You download it once, and it’s yours to use forever, no internet connection or API key required.

LanceDB is an open-source vector database designed for simplicity and speed. It’s your AI’s long-term memory, storing the knowledge from your documents in a way the model can instantly search. It’s serverless and embeds directly into your Python application.

The result?

Everything happens on your laptop. Your documents are never sent to a third-party server. Your conversations are yours alone.

Step 1: Get Your Core Tools Running

Before we dive into the code, we need to set up the foundation. This part is surprisingly simple.

First, download and install Ollama for your operating system (macOS, Windows, or Linux).

Open a new, separate terminal window and run this command. Leave this terminal running in the background.

ollama serve

Note: If you get an error that the server is already running, that’s fine: it means the app started automatically.

Once installed, open your terminal and pull the models we’ll need. We’ll use Llama 3 for generation and a specialized model for embeddings.

ollama pull llama3
ollama pull nomic-embed-text

That’s it. You now have a world-class LLM running on your machine.

The Secret Sauce: A Dependency List That Just Works

This is where most tutorials fail you.

They give you code but neglect the most important part: the environment. You end up in “dependency hell,” trying to figure out why libraries refuse to cooperate.

I’ve done the tedious work for you. For managing Python versions like a pro, I recommend pyenv:

pyenv install 3.14.0
pyenv local 3.14.0

Python 3.14.X is the current stable version I built and tested this tutorial.

Key Libraries:

pip install lancedb==0.25.3 ollama==0.6.1 pypdf==6.2.0

This specific combination is your shortcut past hours of troubleshooting. It ensures the code runs on the first try.

For a quick start, the full repository — including the Python script and requirements.txt — is available on GitHub.

Now, let’s walk through the code.

Step 2: The Code: Your Private RAG Engine in Under 70 Lines

The environment is ready.

Here is the complete Python script. Make sure to place a PDF file named file.pdf in the same directory.

import ollama
import lancedb
from pypdf import PdfReader
import os

# --- CONFIGURATION ---
DOC_PATH = "file.pdf"  # Ensure this file exists
DB_PATH = "./lancedb_data"
MODEL_EMBED = "nomic-embed-text"
MODEL_GEN = "llama3"

print("--- 🧠 Building Knowledge Base with LanceDB ---")

# 1. INGESTION: Read PDF
try:
    reader = PdfReader(DOC_PATH)
    text = ""
    for page in reader.pages:
        text += page.extract_text() or ""
except FileNotFoundError:
    print(f"❌ Error: Could not find {DOC_PATH}. Please add a PDF file.")
    exit()

# 2. CHUNKING
chunk_size = 1000
text_chunks = [text[i:i+chunk_size] for i in range(0, len(text), chunk_size)]
print(f"📄 Processed {len(text_chunks)} text chunks.")

# 3. EMBEDDING & STORAGE
# LanceDB expects a list of dictionaries (like JSON rows)
data = []
print("⏳ Generating embeddings (this may take a moment)...")

for i, chunk in enumerate(text_chunks):
    # Generate vector using Ollama
    response = ollama.embeddings(model=MODEL_EMBED, prompt=chunk)
    vector = response["embedding"]

    # Prepare row for database
    data.append({
        "id": i,
        "text": chunk,
        "vector": vector
    })

# Connect to LanceDB (creates the folder automatically)
db = lancedb.connect(DB_PATH)

# Create (or overwrite) the table
# vector_len depends on the model (nomic-embed-text is usually 768)
try:
    table = db.create_table("resume", data=data, mode="overwrite")
    print("💾 Knowledge stored in LanceDB (Local Disk).")
except Exception as e:
    print(f"⚠️ Database error: {e}")
    exit()

# 4. RETRIEVAL LOOP
while True:
    query = input("\n❓ Ask about the file (or 'quit'): ")
    if query.lower() == 'quit': break

    # Embed the question
    query_vec = ollama.embeddings(model=MODEL_EMBED, prompt=query)["embedding"]

    # Search LanceDB (Find nearest neighbors)
    # .limit(1) = Get the top 1 most relevant chunk
    results = table.search(query_vec).limit(1).to_list()

    if results:
        best_chunk = results[0]["text"]
        print(f"\n🔍 Context Found:\n'...{best_chunk[:100]}...'")

        # 5. GENERATION
        prompt = f"""
        Use the context below to answer the question.

        Context: {best_chunk}
        Question: {query}
        """

        response = ollama.generate(model=MODEL_GEN, prompt=prompt)
        print(f"\n🤖 Answer: {response['response']}")
    else:
        print("❌ No relevant info found.")

How the Code Works: A Quick Breakdown

This script implements a classic Retrieval-Augmented Generation (RAG) workflow. It looks complex, but it’s just five logical steps.

Ingestion & Chunking
First, the script loads your PDF and splits its text into manageable 1,000-character pieces (“chunks”). A model can’t process a 100-page document all at once, so this step makes the information digestible.
Embedding & Storage
Next, it loops through each chunk and asks Ollama’s nomic-embed-text model to convert it into a vector (a list of numbers representing its meaning). These vectors, along with the original text, are saved into a LanceDB table right on your disk. Your knowledge base is now built.
The Query Loop
The script then enters a loop, waiting for you to ask a question.
Retrieval
When you enter a query, it’s also converted into a vector. LanceDB then performs a lightning-fast search to find the text chunk with the most similar vector. This is the “retrieval” step: finding the single most relevant piece of context from the entire document to answer your specific question.
Generation
Finally, the script bundles that relevant chunk with your original question into a new prompt. It sends this to Llama 3, which uses the provided context to generate a precise, accurate answer.

This flywheel from your question to retrieval to a generated answer is what makes the magic happen.

Step 3: Put Your Assistant to Work (Real-World Examples)

Running the script is just the start. The real power is in the quality of your questions.

I tested this with a dense, jargon-filled financial document: “Meta Reports Third Quarter 2025 Results.”

Here’s how a simple script transforms into a powerful analytical partner.

Example 1: Instantly Summarize Complex Financial Reports

The document is full of SEC boilerplate and investor-speak. My goal was to cut through the noise and get the most critical information, fast.

My Prompt:

Summarize the key financial highlights from this report in three bullet points. Focus on revenue, net income, and daily active users.

AI’s Response:

🤖 Answer: Here are the key financial highlights summarized in three bullet points:

Revenue: Revenue was $51.24 billion, an increase of 26% year-over-year (25% on a constant currency basis).

Net Income: No net income figure is mentioned in this report, only revenue and other financial metrics.

Daily Active Users (DAU): Family daily active people (DAP) was 3.54 billion on average for September 2025, an increase of 8% year-over-year.

In seconds, you get the core metrics without reading 20 pages. This is perfect for grasping the essentials before a meeting or making a quick decision.

Example 2: Prepare for a High-Stakes Stakeholder Meeting

Now, let’s move beyond simple data extraction.

Imagine you have to present these findings to skeptical stakeholders. You can use the AI to pressure-test your arguments and anticipate difficult questions.

My Prompt:

Assume the role of a skeptical board member. Based on this report, what are the three most challenging questions you would ask me about the company’s performance and future outlook?

AI’s Response:

🤖 Answer: As a skeptical board member, I’d want to drill down into the details and potential risks associated with the company’s performance and future outlook. Here are three challenging questions I’d ask:

What specific factors are driving the uncertainty around dividend payments? You mentioned “market conditions or other factors” as potential risks affecting dividend payments. Can you provide more granularity on what exactly these factors are, and how they might impact our investors? Are we talking about changes in interest rates, economic downturns, or something else entirely?

What is the company doing to mitigate these risks and ensure a stable financial future? Given the uncertainty around dividend payments, I’d want to know if the company has implemented any strategies to reduce its exposure to market fluctuations or other external factors. Are we investing in diversification initiatives, hedging instruments, or other risk-reduction measures?

How confident are you that our quarterly reports will continue to accurately reflect the company’s performance and future outlook? You mention that forward-looking statements are based on assumptions that seem reasonable as of October 29, 2025, but I’d want more assurance that these assumptions won’t change dramatically in the coming quarters. Can we get a sense of what specific factors might lead us to revisit or revise our quarterly reports going forward?

This is a game-changer. Your private AI just became your personal sparring partner, helping you walk into a critical presentation prepared and confident.

This isn’t just a fun project. It’s a foundational step towards building truly sovereign AI applications. The tools are free, the data is yours, and the power is back on your machine.

Now go build something.

Top comments (1)

Abe Wheeler • Dec 12

The pdf format always blows my mind. The amount of file metadata that's encoded, plus the intentional difficulty of actually working with the protocol is such a bizarre solution. And yet it works and solves a real problem. I know I wouldn't be caught dead sending an easily editable file to an investor!