Private-First AI: Building a Browser-Based Mental Health Classifier with WebLLM and WebGPU

#webgpu #webllm #machinelearning #ai

In an era where data privacy is often the price we pay for convenience, mental health data remains the final frontier of personal sensitivity. Most AI-powered sentiment analysis tools require sending your inner thoughts to a cloud server, creating a massive privacy risk. But what if the model never left your computer?

Today, we're diving deep into Edge AI and Privacy-preserving Machine Learning. By leveraging WebLLM, WebGPU Acceleration, and React, we will build a production-grade mental health emotion classifier that runs 100% locally in your browser. No backend, no API costs, and absolute data sovereignty. This is the future of Local LLM implementation, turning your browser into a high-performance AI engine.

The Architecture: Keeping it Local

To achieve near-native performance, we use WebLLM (powered by Apache TVM) to execute the model directly on the GPU via the WebGPU API. This bypasses the traditional bottleneck of JavaScript's CPU limitations.

graph TD
    A[User Input: "I feel overwhelmed today..."] --> B(React UI Layer)
    B --> C{WebGPU Support?}
    C -- Yes --> D[WebLLM Engine / TVM.js Runtime]
    C -- No --> E[Fallback: CPU / Error Message]
    D --> F[Local Cache / IndexedDB]
    F --> G[Model Execution on GPU]
    G --> H[Emotion Classification: 'Anxiety/Stress']
    H --> B
    style G fill:#f9f,stroke:#333,stroke-width:4px

Prerequisites

Before we start, ensure your browser (Chrome 113+ or Edge) supports WebGPU.

Tech Stack: WebLLM, TVM.js (bundled with WebLLM), React, Vite.
Difficulty: Advanced (Requires understanding of asynchronous hooks and GPU memory management).

Step 1: Initializing the WebLLM Engine

First, we need to create a dedicated worker or a hook to manage the model lifecycle. Since loading a 2GB-7GB model can freeze the UI, we'll use an asynchronous initialization pattern.

// useWebLLM.ts
import { useState, useEffect } from 'react';
import * as webllm from "@mlc-ai/web-llm";

export function useWebLLM(modelId: string) {
  const [engine, setEngine] = useState<webllm.MLCEngine | null>(null);
  const [progress, setProgress] = useState(0);

  const initEngine = async () => {
    const engine = new webllm.MLCEngine();

    // Callback to track model download progress
    engine.setInitProgressCallback((report: webllm.InitProgressReport) => {
      setProgress(Math.round(report.progress * 100));
      console.log(report.text);
    });

    await engine.reload(modelId);
    setEngine(engine);
  };

  return { engine, progress, initEngine };
}

Step 2: The Mental Health Classification Logic

We aren't just chatting; we are classifying. We need to craft a system prompt that forces the model to act as a clinical classifier and output JSON or specific labels.

const classifyEmotion = async (engine: webllm.MLCEngine, userInput: string) => {
  const systemPrompt = `
    You are a professional mental health assistant. 
    Analyze the user's input and classify it into one of these categories: 
    [Joy, Sadness, Anger, Anxiety, Neutral]. 
    Respond only with the category name.
  `;

  const messages = [
    { role: "system", content: systemPrompt },
    { role: "user", content: userInput },
  ];

  const reply = await engine.chat.completions.create({
    messages,
    temperature: 0, // Keep it deterministic
  });

  return reply.choices[0].message.content;
};

Step 3: UI Integration (React)

Integrating this into a React component requires careful state handling to prevent re-renders from killing the WebGPU context.

import React, { useState } from 'react';
import { useWebLLM } from './hooks/useWebLLM';

const EmotionApp = () => {
  const { engine, progress, initEngine } = useWebLLM("Llama-3-8B-Instruct-v0.1-q4f16_1-MLC");
  const [input, setInput] = useState("");
  const [result, setResult] = useState("");

  return (
    <div className="p-8 max-w-2xl mx-auto">
      <h1 className="text-2xl font-bold">Local AI Emotion Classifier 🛡️</h1>

      {!engine ? (
        <button 
          onClick={initEngine}
          className="bg-blue-500 text-white px-4 py-2 rounded"
        >
          Load Model (Progress: {progress}%)
        </button>
      ) : (
        <div className="mt-4">
          <textarea 
            className="w-full p-4 border rounded"
            placeholder="How are you feeling?"
            onChange={(e) => setInput(e.target.value)}
          />
          <button 
            onClick={async () => setResult(await classifyEmotion(engine, input))}
            className="mt-2 bg-green-500 text-white px-4 py-2 rounded"
          >
            Analyze Privately
          </button>
          {result && <div className="mt-4 p-4 bg-gray-100 rounded">Detected State: {result}</div>}
        </div>
      )}
    </div>
  );
};

Scaling to Production 🚀

Running LLMs in the browser is heavy. For production apps, you'll need to implement IndexedDB caching for the model weights so users don't download 4GB every time they visit. You should also consider using Web Workers to offload the MLCEngine from the main thread entirely.

For more advanced architecture patterns and production-ready implementations of Edge AI, check out the deep-dive guides over at WellAlly Blog. They cover everything from memory optimization for WebGPU to handling multi-modal inputs in the browser.

Why This Matters

By moving the computation to the Edge, we achieve:

Zero Latency: No round-trips to a server in Virginia.
Privacy: Data literally never leaves the user's RAM.
Cost: $0 in inference tokens. Your user provides the hardware!

The barrier between "Web App" and "AI Workstation" is disappearing. With WebGPU, the browser is no longer a document viewer; it's a high-performance compute node.

What are you building with WebGPU? Drop a comment below! If you found this useful, don't forget to ❤️ and 🦄.