Build a Private AI Search on Your Device: Local RAG in the Browser

#ai #programming #rag #opfs

How many times have you wanted to search your private PDFs, notes, or code files using AI, but hesitated?

We all want the power of AI search. But uploading sensitive documents to external servers is a big privacy risk.

What if you could build a complete search engine that runs 100% inside your browser? No servers, no APIs, and no cost.

At Utilora, we built exactly this. We call it Personal RAG. Here is how we made it work, and how you can do it too.

The Architecture: How to Run RAG on a Web Page

Retrieval-Augmented Generation (RAG) usually requires a backend database, python servers, and API keys. To make it run entirely on the client side, we

combined three modern browser technologies:

Origin Private File System (OPFS): A fast, private storage space in the browser to save indexed document vectors.
Web Workers & Comlink: To run CPU-heavy vector searches without freezing the user interface.
Local Machine Learning Models: Using ONNX Runtime Web and Transformers.js to generate embeddings directly on your CPU or GPU.

Here is the exact flow of how a document is processed:

[ Your File ] ➔ [ Client Parser ] ➔ [ Chunking ] ➔ [ Local ML Embedding ] ➔ [ OPFS Storage ]

Step 1: Storing Vectors Privately (OPFS)

You cannot store millions of text numbers in normal browser storage like LocalStorage. It is too slow and has a 5MB limit.

Instead, we use the Origin Private File System (OPFS). It gives web
apps a private, highly optimized filesystem. Here is a simple look at how we write vector indexes to OPFS:

// Access the private root directory                                                                                                                     
const root = await navigator.storage.getDirectory();                                                                                                     

// Create or access our index file                                                                                                                       
const fileHandle = await root.getFileHandle("vector-index.db", { create: true });                                                                        

// Create a high-speed write stream                                                                                                                      
const accessHandle = await fileHandle.createWritable();                                                                                                  
await accessHandle.write(new TextEncoder().encode(JSON.stringify(myVectorData)));                                                                        
await accessHandle.close();

Step 2: Offloading Work to a Web Worker

We use Comlink by Google to easily communicate with a background Web Worker:

// In your main component
import * as Comlink from "comlink";

const worker = new Worker(
    new URL("./rag-indexer.worker.ts", import.meta.url),
    { type: "module" }
);
const localIndexer = Comlink.wrap(worker);

// Run indexer in the background
await localIndexer.processAndEmbedFile(myUploadedFile);

Why Local RAG is a Game Changer

Building with zero backend constraints completely changes how you think about software:

• True Privacy: Privacy is not a text policy on a page. It is hardcoded into the architecture. Since there is no backend, we cannot see your files even if
we wanted to.
• Completely Free: You do not pay for API keys, vector databases, or server hosting. The user's computer does all the work.
• Instant Offline Access: Once the page loads, you can turn off your internet and it still works.

Try It Yourself

If you want to see this in action, come check it out on Utilora https://utilora.com (our free, open collection of local web utilities).

Drag in a PDF, let it index, and ask questions. Your data never leaves your screen.

Have you built anything using local browser models? Let's chat in the comments below!

DEV Community