Beck_Moulton

Posted on Apr 19

Privacy-First: Building a 100% Local AI Medication Assistant with WebLLM and WebGPU

#ai #python #discuss #security

In the era of AI, privacy is the new luxury—especially when it comes to sensitive medical data. Most healthcare apps rely on cloud-based LLMs, meaning your private prescription history travels across the wire to a remote server. But what if we could bring the brain to the data instead of the data to the brain?

By leveraging WebLLM, WebGPU, and IndexedDB, we can now perform complex Edge AI inference directly in the browser. In this guide, we'll build a 100% localized Medication Assistant that handles fuzzy drug name matching and interaction checks without a single byte of medical data leaving the user's device. Using browser-based AI and private health tech patterns, we are turning the browser into a secure, intelligent vault.

The Architecture

The logic flow ensures that data remains persistent in the browser's storage (IndexedDB) while the WebLLM engine (powered by TVM Runtime) handles the natural language processing using the local GPU.

graph TD
    A[User Input: 'I took Aspirin'] --> B{WebLLM Engine}
    B --> C[Fuzzy Match via Vector/Local DB]
    C --> D[IndexedDB: Drug Records]
    D --> E[Interaction Check Logic]
    E --> F[WebLLM: Natural Language Advice]
    F --> G[UI Update: No Data Sent to Cloud]
    subgraph Browser Context
    B
    D
    E
    F
    end

Prerequisites

To follow this tutorial, you'll need:

WebLLM: For running Large Language Models in the browser.
TVM Runtime: The backbone for WebGPU acceleration.
TypeScript: For type-safe application logic.
IndexedDB: For persistent local storage of medication logs.

1. Initializing the Local Brain (WebLLM)

First, we need to set up the WebLLM engine. This leverages WebGPU to run models like Llama-3 or Mistral locally.

import { CreateMLCEngine, MLCEngine } from "@mlc-ai/web-llm";

async function initAIEngine() {
  const selectedModel = "Llama-3-8B-Instruct-v0.1-q4f16_1-MLC";

  // Initialize engine with a callback to track progress
  const engine = await CreateMLCEngine(selectedModel, {
    initProgressCallback: (report) => {
      console.log("Loading AI Model:", report.text);
    }
  });

  return engine;
}

2. Setting Up the Secure Vault (IndexedDB)

We use IndexedDB to store the user's medication history. Unlike LocalStorage, IndexedDB handles larger datasets efficiently, which is perfect for a local drug database.

import { openDB, IDBPDatabase } from 'idb';

const DB_NAME = 'MedVault';

export async function initDB(): Promise<IDBPDatabase> {
  return openDB(DB_NAME, 1, {
    upgrade(db) {
      if (!db.objectStoreNames.contains('medications')) {
        db.createObjectStore('medications', { keyPath: 'id', autoIncrement: true });
      }
    },
  });
}

async function logMedication(db: IDBPDatabase, drugName: string, dosage: string) {
  await db.add('medications', {
    name: drugName,
    dosage: dosage,
    timestamp: new Date().toISOString()
  });
}

3. Local Inference: Interaction Checking

The "magic" happens when we combine the local database with the LLM. Instead of sending the full list to an API, we inject the local records into the LLM prompt context.

async function checkInteractions(engine: MLCEngine, newDrug: string, history: any[]) {
  const context = history.map(h => h.name).join(", ");

  const prompt = `
    You are a medical assistant running locally. 
    User is taking: ${context}.
    They want to add: ${newDrug}.
    Are there known major interactions? Answer briefly.
  `;

  const messages = [
    { role: "system", content: "You are a helpful, private medical assistant." },
    { role: "user", content: prompt }
  ];

  const reply = await engine.chat.completions.create({ messages });
  return reply.choices[0].message.content;
}

Why Local-First AI Matters

By keeping the inference on the client side, we eliminate latency, reduce server costs, and—most importantly—ensure absolute privacy. The data never leaves the user's machine.

For developers looking to implement these patterns in production environments or exploring more complex "Local-First" architectural designs, I highly recommend checking out the deep-dive articles at WellAlly Tech Blog. They provide excellent resources on scaling private-by-design systems and advanced WebGPU optimizations that go beyond basic implementations.

4. Putting It All Together

In your TypeScript controller, you can now orchestrate the flow:

async function handleUserIntake(userInput: string) {
  const engine = await initAIEngine();
  const db = await initDB();

  // 1. Get history from IndexedDB
  const history = await db.getAll('medications');

  // 2. Perform local AI Check
  const advice = await checkInteractions(engine, userInput, history);

  // 3. Update UI & Log to DB
  console.log("AI Advice:", advice);
  await logMedication(db, userInput, "As prescribed");
}

Conclusion

We've just built a fully functional, privacy-preserving AI assistant using WebLLM and IndexedDB. This approach proves that we don't need to sacrifice user privacy for intelligent features. As WebGPU matures, the boundary of what we can do in the browser will only continue to expand.

What's next?

Try implementing Vector Embeddings inside IndexedDB for even faster drug name fuzzy matching!
Explore TVM Runtime optimizations to reduce model load times.

Got questions about Edge AI? Let’s discuss in the comments!

Happy coding! For more production-ready AI patterns, visit wellally.tech/blog.

DEV Community