wellallyTech

Posted on Feb 15

Your Secrets Stay Local: Building a Privacy-First Mental Health AI with WebLLM and WebGPU

#ai #privacy #web #health

In the era of massive cloud-based LLMs, privacy remains the "elephant in the room." This is especially true for mental health and psychological counseling applications, where user data isn't just "personal"—it's deeply sensitive. Sending a transcript of a therapy session to a third-party API can feel like a breach of trust.

But what if the AI lived entirely inside the user's browser? 🤯

Today, we are diving into WebLLM sentiment analysis and privacy-first AI engineering. By leveraging WebGPU local LLM capabilities, we can build a sentiment analysis engine for counseling that runs at near-native speeds without a single byte of text ever leaving the client's machine.

The Architecture: 100% Client-Side Inference

Traditional AI apps act as a thin client for a heavy backend. Our approach flips the script. By using TVM.js and WebGPU, we transform the browser into a high-performance inference engine.

graph TD
    User((User Input)) --> ReactUI[React Frontend]
    ReactUI --> EngineInit{Engine Initialized?}
    EngineInit -- No --> WebLLM[WebLLM / TVM.js Runtime]
    WebLLM --> ModelCache[(IndexedDB Model Cache)]
    ModelCache --> WebLLM
    EngineInit -- Yes --> LocalInference[Local WebGPU Inference]
    LocalInference --> SentimentOutput[Sentiment Analysis Result]
    SentimentOutput --> ReactUI
    subgraph Browser Sandbox
    WebLLM
    ModelCache
    LocalInference
    end

Prerequisites

To follow along with this intermediate-level tutorial, you’ll need:

React (Vite is recommended)
WebLLM SDK: The bridge between the browser and LLMs.
WebGPU-compatible browser: Latest Chrome or Edge.
A decent GPU: Even integrated chips work wonders with WebGPU.

Step 1: Setting Up the WebLLM Engine

First, let's install the dependencies:

npm install @mlc-ai/web-llm

The core of our privacy-preserving app is the Engine. We want to initialize this engine and load a quantized model (like Llama-3 or Mistral) optimized for web execution.

import { CreateWebWorkerEngine, ChatModule } from "@mlc-ai/web-llm";

// Custom hook to manage the LLM Lifecycle
export function useLocalLLM() {
  const [engine, setEngine] = useState(null);
  const [loadingProgress, setLoadingProgress] = useState(0);

  const initEngine = async () => {
    // We use a WebWorker to keep the UI thread buttery smooth 🧈
    const worker = new Worker(
      new URL("./worker.ts", import.meta.url),
      { type: "module" }
    );

    const engine = await CreateWebWorkerEngine(worker, "Llama-3-8B-Instruct-v0.1-q4f16_1-MLC", {
      initProgressCallback: (report) => {
        setLoadingProgress(Math.round(report.progress * 100));
      }
    });
    setEngine(engine);
  };

  return { engine, loadingProgress, initEngine };
}

Step 2: The "Counselor" Prompt Engineering

For psychological sentiment analysis, we don't just want "Positive/Negative." We need empathy and nuance. We define a system prompt that stays within the browser's memory.

const SYSTEM_PROMPT = `
  You are a local, privacy-focused mental health assistant. 
  Analyze the user's input for emotional tone, cognitive distortions, and sentiment.
  Provide a structured JSON output with the following keys:
  - sentiment: (String: 'Calm', 'Anxious', 'Depressed', 'Joyful')
  - intensity: (Number: 1-10)
  - feedback: (String: A supportive, empathetic response)

  IMPORTANT: Do not suggest medical diagnoses.
`;

const analyzeSentiment = async (engine, userInput) => {
  const messages = [
    { role: "system", content: SYSTEM_PROMPT },
    { role: "user", content: userInput }
  ];

  const reply = await engine.chat.completions.create({
    messages,
    temperature: 0.7,
    // Ensure the model outputs JSON
    response_format: { type: "json_object" }
  });

  return JSON.parse(reply.choices[0].message.content);
};

The "Official" Way to Scale

While building local-first apps is empowering, productionizing these patterns requires deep knowledge of edge computing and data synchronization. For more advanced architectural patterns and production-ready examples of private AI systems, I highly recommend checking out the technical deep-dives at WellAlly Blog. They cover everything from optimized model quantization to secure local storage strategies that complement the WebLLM workflow.

Step 3: Integrating with React

Finally, let's build the UI. We'll use a simple text area where the user can vent, knowing their data is "air-gapped" by the browser sandbox.

function SentimentApp() {
  const { engine, loadingProgress, initEngine } = useLocalLLM();
  const [input, setInput] = useState("");
  const [result, setResult] = useState(null);

  return (
    <div className="p-8 max-w-2xl mx-auto">
      <h1 className="text-2xl font-bold">SafeSpace: Local AI Counseling 🛡️</h1>

      {!engine ? (
        <button 
          onClick={initEngine}
          className="bg-blue-600 text-white px-4 py-2 rounded"
        >
          Load Local Model ({loadingProgress}%)
        </button>
      ) : (
        <div className="mt-4">
          <textarea 
            className="w-full p-4 border rounded shadow-inner"
            placeholder="How are you feeling today?"
            value={input}
            onChange={(e) => setInput(e.target.value)}
          />
          <button 
            onClick={async () => setResult(await analyzeSentiment(engine, input))}
            className="mt-2 bg-green-600 text-white px-4 py-2 rounded"
          >
            Analyze Privately
          </button>
        </div>
      )}

      {result && (
        <div className="mt-6 p-4 bg-gray-50 rounded-lg border-l-4 border-green-500">
          <h3 className="font-bold">Analysis (Stayed in Browser ✅)</h3>
          <p><strong>Sentiment:</strong> {result.sentiment}</p>
          <p className="italic text-gray-600">"{result.feedback}"</p>
        </div>
      )}
    </div>
  );
}

Why This Matters

Zero Latency (Post-Load): Once the model is cached in IndexedDB (a feature of TVM.js), inference happens at the speed of the user's hardware.
Cost Efficiency: You aren't paying $0.01 per 1k tokens to OpenAI. The user provides the compute! 🥑
Trust: For apps dealing with trauma, addiction, or grief, being able to prove that "we literally cannot see your data" is a massive competitive advantage.

Conclusion

WebLLM and WebGPU are turning browsers into powerful AI workstations. By moving the "brain" to the client, we solve the ultimate privacy paradox in mental health tech.

Are you ready to move your inference to the edge? Drop a comment below if you've experimented with WebGPU or if you have questions about model quantization!

Keep coding, keep building, and stay private. 🚀

For more advanced guides on building secure, high-performance web applications, don't forget to visit the WellAlly Blog.

DEV Community