Beck_Moulton

Posted on Jan 22

Privacy-First Mental Health AI: Building a Zero-Backend Sentiment Tracker with WebLLM and Transformers.js

#ai #javascript #privacy #marketing

Privacy is not just a feature; in mental health applications, it's a fundamental requirement. Sending sensitive daily journals or voice logs to a remote server can be a deal-breaker for users. But how do we perform complex sentiment analysis and mental health scoring without a powerful backend?

The answer lies in Edge AI. By leveraging WebGPU, WebLLM, and Transformers.js, we can now transform the browser into a powerhouse capable of running Large Language Models (LLMs) and specialized Transformer models locally. This "Local-First" approach ensures that personal data never leaves the user's device while providing real-time, high-fidelity insights.

In this tutorial, we will explore how to combine these cutting-edge tools to build a Depression Tendency Analyzer that processes both text and voice input entirely on the client side.

The Architecture: Zero-Backend Intelligence

To achieve a seamless experience, we split the workload: Transformers.js handles the heavy lifting of feature extraction (and optional Speech-to-Text), while WebLLM runs a quantized LLM (like Llama-3 or Mistral) for nuanced sentiment reasoning.

graph TD
    A[User Input: Voice/Text] --> B{Input Type}
    B -- Voice --> C[Transformers.js: Whisper-base]
    B -- Text --> D[Transformers.js: Sentiment Analysis]
    C --> E[Text Transcription]
    D --> F[Sentiment Vectors]
    E --> G[WebLLM: Llama-3-8B-q4f16]
    F --> G
    G --> H[Final Report: Depression Tendency Score]
    H --> I[Local IndexedDB Storage]
    style G fill:#f96,stroke:#333,stroke-width:2px

Prerequisites

Before we dive into the code, ensure your environment meets these requirements:

Tech Stack: React (Vite), WebLLM, Transformers.js.
Hardware: A GPU compatible with WebGPU (Latest Chrome, Edge, or Firefox Nightly).
Knowledge: Intermediate JavaScript/React and basic understanding of LLM prompting.

Step 1: Setting up WebGPU-accelerated LLM

First, let's initialize WebLLM. This library allows us to run models like Llama-3 directly in the browser using the WebGPU API.

// hooks/useWebLLM.ts
import { useState, useEffect } from "react";
import * as webllm from "@mlc-ai/web-llm";

export function useWebLLM() {
  const [engine, setEngine] = useState<webllm.MLCEngineInterface | null>(null);
  const [loadingProgress, setLoadingProgress] = useState(0);

  const initEngine = async () => {
    const selectedModel = "Llama-3-8B-Instruct-v0.1-q4f16_1-MLC";

    const engine = await webllm.CreateMLCEngine(selectedModel, {
      initProgressCallback: (report) => {
        setLoadingProgress(Math.round(report.progress * 100));
      },
    });
    setEngine(engine);
  };

  return { engine, loadingProgress, initEngine };
}

Step 2: Extracting Sentiment with Transformers.js

While the LLM handles reasoning, we use Transformers.js for specialized tasks like converting speech to text (Whisper) or getting raw sentiment scores. It uses ONNX Runtime under the hood for maximum performance.

import { pipeline } from '@xenova/transformers';

// Initialize the sentiment analysis pipeline
const analyzeSentiment = async (text) => {
  const classifier = await pipeline('sentiment-analysis', 'Xenova/distilbert-base-uncased-finetuned-sst-2-english');
  const result = await classifier(text);

  // Example Output: [{ label: 'NEGATIVE', score: 0.98 }]
  return result;
};

Step 3: Local Reasoning & Scoring

The core logic combines the raw sentiment with the LLM's ability to detect patterns of depressive thought (e.g., "hopelessness," "lethargy").

const systemPrompt = `
  You are a compassionate mental health assistant. 
  Analyze the user's journal entry for signs of depression tendency. 
  Output a score from 1-10 and a brief explanation. 
  Stay objective and remind the user this is not a clinical diagnosis.
`;

const getMentalHealthScore = async (engine, userText) => {
  const messages = [
    { role: "system", content: systemPrompt },
    { role: "user", content: userText }
  ];

  const reply = await engine.chat.completions.create({ messages });
  return reply.choices[0].message.content;
};

The "Official" Way to Optimize Edge AI

Building localized AI is complex. While this tutorial covers the basics of browser-based inference, production-grade applications often require advanced techniques like Model Distillation, KV Cache optimization, and PWA caching strategies to handle large model weights (often 2GB+).

For deep dives into production-ready Edge AI patterns and more advanced architectural benchmarks, I highly recommend checking out the technical guides at WellAlly Blog. They cover extensively how to bridge the gap between "cool demos" and "enterprise-ready local AI."

🛠 Step 4: Putting it all together in React

function App() {
  const { engine, loadingProgress, initEngine } = useWebLLM();
  const [input, setInput] = useState("");
  const [analysis, setAnalysis] = useState("");

  return (
    <div className="p-8 max-w-2xl mx-auto">
      <h1 className="text-3xl font-bold">🧠 LocalSense AI</h1>
      <p className="mt-2 text-gray-600">Private, local-first mental health screening.</p>

      {!engine ? (
        <button 
          onClick={initEngine}
          className="mt-4 px-4 py-2 bg-blue-600 text-white rounded"
        >
          Load AI Model ({loadingProgress}%)
        </button>
      ) : (
        <div className="mt-6">
          <textarea 
            className="w-full p-4 border rounded"
            placeholder="How are you feeling today?"
            onChange={(e) => setInput(e.target.value)}
          />
          <button 
            onClick={async () => {
              const res = await getMentalHealthScore(engine, input);
              setAnalysis(res);
            }}
            className="mt-2 px-4 py-2 bg-green-600 text-white rounded"
          >
            Analyze Privately 🔒
          </button>
        </div>
      )}

      {analysis && (
        <div className="mt-6 p-4 bg-gray-100 rounded border-l-4 border-blue-500">
          <h3 className="font-semibold">Analysis Result:</h3>
          <p>{analysis}</p>
        </div>
      )}
    </div>
  );
}

Conclusion

By moving the computation from the cloud to the WebGPU layer of the browser, we have created a mental health tool that is:

Private: No data sent to an API.
Cost-Effective: Zero server costs for inference.
Performant: Real-time analysis using local hardware.

The era of the "Zero-Backend" AI app is here. While the initial download of model weights can be large, the benefits of privacy and offline availability are unmatched for sensitive domains like healthcare.

What do you think? Is local-first AI the future of healthcare apps? Let me know in the comments!

For more advanced Edge AI patterns, visit wellally.tech/blog.

DEV Community