Mick.Net

Posted on Jun 14

Building Document.Bot: A Local-First AI Workspace for Real Documents (PDF/ Office)

#ai #productivity

Most AI document workflows still feel like they were designed around chat, not around documents.

You upload a PDF. You ask a question. You upload another file. You ask again. If the answer looks useful, you still have to go back to the original source, verify the passage, copy context into another document, and repeat the same process tomorrow.

That loop is fine for a single file. It breaks down when the work lives in folders: PDFs, Word files, spreadsheets, Markdown notes, reference material, exports, drafts, and half-finished research.

I built Document.Bot around a different assumption: the folder is the workspace.

Document.Bot is a local-first AI workspace for document-heavy work. You connect a folder of PDFs, Word files, spreadsheets, Markdown, and notes; the app indexes it locally; then you can search, inspect original sources, ask AI across the workspace, and draft reviewable, source-backed outputs without repeatedly uploading files into chat.

The architecture follows from that product promise. Document.Bot is a desktop app for macOS and Windows, built with Electron, React, LangChain, LangGraph, and LanceDB.

This is a builder note on why those choices fit the product, where the boundaries are, and what I learned building an AI app that treats retrieval and source inspection as core product features instead of invisible plumbing.

Why I Started With Desktop

A lot of AI products start in the browser. For many categories, that is the right call. Distribution is simpler, updates are immediate, and collaboration is easier to centralize.

Document.Bot was different.

The documents people care about are already on their machines. They are in synced folders, messy project directories, exports from other tools, email attachments, old client folders, and local archives. Asking users to repeatedly upload those files into a web app creates friction and raises trust questions immediately.

Desktop gives the app a more natural relationship with the user's files.

It can work with local folders directly. It can watch and index files in place. It can open original documents and preserve the user's existing folder structure. It can keep local document processing local unless the user chooses a model or provider that requires remote calls.

That matters for document-heavy work because the user often does not just want "an answer." They want to know where the answer came from. They want to inspect the source. They want to compare a generated draft with the original file. They want the workspace to feel grounded in their own material, not like a temporary upload box.

Electron was the practical choice because I wanted a real desktop product for both macOS and Windows while still moving quickly with a modern web UI stack.

Electron as the Local Application Shell

In Document.Bot, Electron is not just a wrapper around a web page. It is the boundary between the product UI and the local machine.

The main process owns local services: app lifecycle, filesystem access, indexing orchestration, database access, IPC, packaging concerns, and OS integration. The renderer owns the product experience: the workspace, search UI, document panes, chat, settings, and review flows.

That separation is important.

The renderer should not become a privileged grab bag of local capabilities. It should behave like a React application with a clear interface to the system layer. The main process can expose narrowly scoped operations through a preload bridge and IPC: select a folder, read indexed metadata, run a search, open a document, start or monitor an indexing job, and so on.

The pattern looks roughly like this:

Electron main process: local services, filesystem, app lifecycle, IPC, packaging
Preload layer: constrained bridge between privileged code and UI
React renderer: workspace UI, source inspection, chat and review flows
Local indexing layer: document metadata, chunks, embeddings, retrieval state
AI orchestration layer: model calls, tools, agent state, workflow control

For build tooling, the app uses electron-vite for the main, preload, and renderer bundles, and electron-builder for desktop packaging. The packaging path matters because Document.Bot depends on native and platform-sensitive pieces: local databases, PDF tooling, OCR/indexing helpers, and model/indexing runtimes need to behave predictably after the app is packaged, not only in development.

That architecture keeps the UI productive without pretending the app is just a website. It also makes the security posture easier to reason about. The document agent does not need arbitrary shell power to be useful. In fact, giving agents broad local authority by default is one of the fastest ways to turn a useful assistant into an uncomfortable product.

Document.Bot's job is to help users work with their documents, not to let an AI roam the machine.

React for the Product Surface

React is the obvious part of the stack, but it still shapes the product.

Document.Bot has a lot of stateful UI: connected folders, indexing status, search results, file and source tags, document viewers, chat threads, review panels, model settings, and account-related surfaces. React works well for that kind of application because the UI is a composition of many small, state-driven surfaces.

The key design choice is that the interface cannot be "just chat."

Chat is useful, but document work needs more than a message box. Users need to see the corpus. They need to search before asking. They need to inspect source files after receiving an answer. They need to know which files were used, which chunks were retrieved, and whether the output is something they can review.

That is why the UI is built around a workspace rather than a conversation alone.

Search, source tags, viewer panes, and review flows are not secondary features. They are how the user builds trust. A generated answer is much more useful when it is connected back to the underlying files and when the user can jump from a claim to the original source.

This also changes the role of AI in the interface. The model is not the whole product. It is one layer inside a workflow that includes browsing, filtering, inspecting, drafting, and verifying.

LanceDB for Local Retrieval

Once the app is folder-first, local indexing becomes central.

Document.Bot needs to parse documents, split them into useful chunks, store metadata, and retrieve relevant material before asking a model to answer. LanceDB fits this job because it can live with the desktop app and support retrieval over local document chunks, metadata, and embeddings.

In the current architecture, Document.Bot's QMD search layer uses a LanceDB-backed store. It keeps the product-facing search/tool API stable while supporting full-text search, vector search, and hybrid retrieval paths over the indexed workspace.

The important product idea is simple: the model should not be the first thing that touches the whole workspace.

Before the AI answers, the app should narrow the problem. It should search the local index, find relevant chunks, preserve source metadata, and make the retrieval process visible enough that the user can inspect it.

A local index also gives the app a durable memory of the connected folder without turning the model provider into the document system of record. Original files remain the source of truth. The index is a rebuildable representation of those files.

That distinction matters.

Indexes can get stale. Parsers can improve. Embedding models can change. File formats have edge cases. If the index is treated as a rebuildable cache, the app can recover from those changes. If the index is treated as the canonical document store, small mistakes become much more expensive.

One lesson from building this kind of system: keep indexing durable, but not sacred.

The app should remember enough state to avoid redoing work unnecessarily, but it should always be possible to rebuild from the original folder. The original files should remain inspectable outside the AI workflow.

Separating Discovery Text From Document Views

There is another practical lesson hidden inside document indexing: extracted text and high-fidelity document views serve different purposes.

For retrieval, the app needs discovery text. It needs enough clean, structured text to chunk, search, embed, and cite. That text does not need to perfectly reproduce every visual detail of the source document.

For user review, the app needs high-fidelity views. The user may need to see the original PDF layout, tables, formatting, page structure, spreadsheet cells, comments, or surrounding context. The review experience cannot rely only on the text that was extracted for indexing.

Keeping those two layers separate makes the product better.

The retrieval layer can optimize for search quality, chunk boundaries, metadata, and model context. The viewing layer can optimize for human inspection. When a user clicks a source, they should be brought back toward the original artifact, not just shown an isolated chunk in a vacuum.

This is one of the design principles I would repeat in any document AI app: retrieval is not enough. Source inspection is part of the product.

LangChain and LangGraph for AI Workflows

Document.Bot uses LangChain and LangGraph because the AI layer is more than one prompt.

There are provider choices, model choices, retrieval tools, document context, agent state, long-running workflows, and places where the app may need to pause, resume, or let the user review intermediate results.

LangChain is useful for provider and model abstraction. It gives the app a way to work across different model backends without hard-coding every interaction into one provider's API shape.

LangGraph is useful once the workflow becomes stateful. A document assistant often needs more structure than "send messages to model." It may need to retrieve, reason over sources, ask a follow-up, draft an output, revise it, preserve state, and resume after interruption.

That is where graph-based orchestration fits. It lets the app model the AI workflow as a set of steps with state, rather than as a single opaque call.

Checkpointing and interruptibility matter in desktop software. Users close laptops. Indexing takes time. AI calls fail. A document task may run longer than a simple chat response. The app should be able to handle that without losing the whole workflow.

Document.Bot uses a local checkpointer so thread state can survive beyond a single model call. That comes with engineering tradeoffs: checkpoint storage must be maintained, bounded, and recoverable. But for a desktop app with long-running document work, durable state is part of the product, not just a framework detail.

The product boundary is just as important as the technical one: Document.Bot is not trying to make document editing blindly autonomous. The goal is reviewable, source-backed output. AI can help search, summarize, compare, and draft, but the user should stay in the loop for judgment and final decisions.

Making Model Choice Explicit

One mistake AI apps can make is hiding too much.

For a casual assistant, hiding complexity can be good. For document-heavy work, users often care what is happening. They may care which model is used, which provider receives context, whether a local or hosted path is involved, and how source material is selected.

That does not mean every user wants a wall of settings. It does mean the app should not blur important boundaries.

Document.Bot's architecture is built with model choice and provider abstraction in mind. The UI can present choices at the product level, while the AI orchestration layer handles the implementation details.

This also keeps the product honest. Different models have different strengths, costs, context windows, and privacy implications. Treating the model as an explicit part of the workflow helps users understand the tradeoffs.

The Security Line: No Arbitrary Shell Power by Default

Local-first AI products need a stricter default posture than web chatbots.

If an app can access local folders and an AI agent can call tools, the permissions model matters. A useful document agent does not automatically need broad filesystem or shell access. Most document workflows can be served with constrained operations: index this folder, search these documents, retrieve these sources, open this file, draft from this context.

That narrower tool surface is easier to explain and easier to trust.

The principle I follow is: give the agent product tools, not machine tools, unless there is a specific reason.

A search tool scoped to the connected document index is very different from shell access. A document-open operation mediated by the app is different from arbitrary filesystem traversal. These distinctions are not academic. They are what let a local AI workspace feel like software rather than a risky automation script.

What I Would Tell Other Builders

If you are building an AI product around documents, I would avoid starting with the model as the center of the architecture.

Start with the workflow.

Where do the documents live? How does the user verify an answer? What is the source of truth? What happens when indexing fails? Can the user rebuild the index? Can they inspect the original file? Does the AI have only the tools it needs? Is the model choice visible when it matters?

The stack matters, but the product boundaries matter more.

Electron gave Document.Bot a practical desktop shell for macOS and Windows. React made it possible to build a rich workspace UI. LanceDB gave the app a local retrieval layer. LangChain and LangGraph provided structure for provider abstraction, tool use, state, and longer-running agent workflows.

But the core idea is simpler than the stack: document AI should feel grounded.

A good answer is not just fluent. It is connected to the files the user cares about. It can be inspected. It can be reviewed. It does not ask the user to keep re-uploading the same folder into yet another chat window.

That is what I am building with Document.Bot.

You can download or learn more at https://document.bot.

DEV Community