Beck_Moulton

Posted on Apr 29

Privacy-First AI: Building a WebGPU-Powered Drug Interaction Checker with WebLLM

#ai #python #typescript #react

In the world of healthcare technology, data privacy isn't just a "nice-to-have"—it's a legal and ethical fortress. When dealing with sensitive medical queries like drug-to-drug interactions, sending a patient's medication list to a cloud-based LLM can raise significant compliance eyebrows.

But what if the LLM never left the user's computer?

Today, we’re pushing the boundaries of Edge AI and Browser-side computing. We're going to build a high-performance, fully local Drug Interaction Retrieval tool using WebGPU, WebLLM, and React. This setup ensures millisecond-level inference without a single byte of medical data ever touching a server.

Why WebGPU and WebLLM?

Until recently, running a Large Language Model (LLM) in a browser meant sluggish CPU inference or specialized plugins. WebGPU changes the game by providing low-level access to the GPU's parallel processing power directly through the browser. WebLLM leverages this to run models like Llama-3 or Mistral at native-like speeds.

Keywords: WebGPU AI acceleration, Local LLM browser, Edge AI privacy, WebLLM React tutorial, Privacy-preserving Healthcare AI.

The Architecture: Local-First Inference

To keep our UI buttery smooth, we don't run the model on the main thread. Instead, we offload the heavy lifting to a Web Worker that communicates with the GPU via the WebGPU API.

graph TD
    A[User Input: Drug Names] --> B(React UI Component)
    B --> C{Web Worker}
    C --> D[WebLLM Engine]
    D --> E[WebGPU API]
    E --> F[Local GPU / VRAM]
    F --> E
    E --> D
    D --> G[JSON Interaction Report]
    G --> B
    B --> H[Display Results to User]
    style F fill:#f96,stroke:#333,stroke-width:2px
    style D fill:#bbf,stroke:#333,stroke-width:2px

Prerequisites

Before we dive into the code, ensure you have:

A WebGPU-compatible browser (Chrome 113+, Edge 113+).
Node.js and your favorite package manager.
The tech_stack: React, TypeScript, WebLLM.

Step 1: Setting up the WebLLM Engine

First, we need to initialize the engine. Since medical accuracy is paramount, we'll prompt the model to act as a clinical pharmacist.

// engine.ts
import { CreateMLCEngine, MLCEngine } from "@mlc-ai/web-llm";

const modelId = "Llama-3-8B-Instruct-v0.1-q4f16_1-MLC"; // Efficient 4-bit quantized model

export async function initializeEngine(onProgress: (p: number) => void) {
  const engine = await CreateMLCEngine(modelId, {
    initProgressCallback: (report) => {
      onProgress(Math.round(report.progress * 100));
    },
  });
  return engine;
}

Step 2: Crafting the Clinical Prompt

The "secret sauce" for a retrieval tool is the System Prompt. We need the LLM to provide structured data regarding contraindications.

const SYSTEM_PROMPT = `
You are a professional Clinical Pharmacist. 
Analyze the interactions between the drugs provided by the user.
Return the result in the following JSON format:
{
  "interactions": [
    { "drugs": ["Drug A", "Drug B"], "severity": "High/Medium/Low", "description": "..." }
  ],
  "recommendation": "..."
}
`;

export const checkInteractions = async (engine: MLCEngine, drugs: string[]) => {
  const prompt = `Analyze these medications: ${drugs.join(", ")}`;

  const response = await engine.chat.completions.create({
    messages: [
      { role: "system", content: SYSTEM_PROMPT },
      { role: "user", content: prompt }
    ],
    response_format: { type: "json_object" } // Enforce JSON output
  });

  return JSON.parse(response.choices[0].message.content || "{}");
};

Step 3: Building the React Interface

Now, let's tie it all together. We want a clean UI that handles the loading state (since downloading the model weights takes a moment).

import React, { useState, useEffect } from 'react';
import { initializeEngine, checkInteractions } from './engine';

const InteractionChecker: React.FC = () => {
  const [engine, setEngine] = useState<any>(null);
  const [progress, setProgress] = useState(0);
  const [input, setInput] = useState("");
  const [results, setResults] = useState<any>(null);

  const handleInit = async () => {
    const mlcEngine = await initializeEngine(setProgress);
    setEngine(mlcEngine);
  };

  const handleQuery = async () => {
    if (!engine) return;
    const drugs = input.split(",").map(d => d.trim());
    const data = await checkInteractions(engine, drugs);
    setResults(data);
  };

  return (
    <div className="p-8 max-w-2xl mx-auto">
      <h1 className="text-2xl font-bold mb-4">💊 Local Drug Safe-Check</h1>

      {!engine ? (
        <button onClick={handleInit} className="bg-blue-600 text-white px-4 py-2 rounded">
          Load Local AI Model ({progress}%)
        </button>
      ) : (
        <div className="space-y-4">
          <input 
            className="w-full border p-2 rounded"
            placeholder="Enter drugs (e.g., Aspirin, Warfarin)"
            value={input}
            onChange={(e) => setInput(e.target.value)}
          />
          <button onClick={handleQuery} className="bg-green-600 text-white px-4 py-2 rounded">
            Check Interactions Locally
          </button>
        </div>
      )}

      {results && (
        <div className="mt-6 p-4 bg-gray-100 rounded shadow">
          <pre>{JSON.stringify(results, null, 2)}</pre>
        </div>
      )}
    </div>
  );
};

Going Beyond the Basics

While this demo shows the power of WebGPU, production-grade applications often require complex orchestration, such as RAG (Retrieval-Augmented Generation) with local vector databases or sophisticated fallback mechanisms.

Pro-Tip: For more advanced architectural patterns and production-ready examples of local-first AI, I highly recommend checking out the technical deep-dives at WellAlly Blog. They cover everything from optimized model quantization to specialized healthcare LLM fine-tuning.

Performance & Privacy Wins

By leveraging WebGPU:

Zero Latency: Once the model is loaded into the browser's VRAM, inference is nearly instantaneous.
Privacy by Design: The "Drug A + Drug B" query never leaves the localhost. No HIPAA concerns, no data leaks.
Cost Effective: You aren't paying $0.01 per 1k tokens to OpenAI. The user provides the compute!

Conclusion

Running LLMs locally via WebGPU isn't just a gimmick—it's the future of private, secure, and offline-capable web applications. Whether you're building medical tools or private document analyzers, the browser is becoming a first-class AI powerhouse.

What are you building with WebGPU? Let me know in the comments below! 👇

DEV Community