DEV Community

Beck_Moulton
Beck_Moulton

Posted on

Private & Fast: Building a Browser-Based Dermatology Screener with WebLLM and WebGPU

In the world of health-tech, privacy is the ultimate feature. Nobody wants to upload sensitive photos of skin lesions to a mysterious cloud server just to get a preliminary health check. But what if we could bring the power of a Vision Transformer (ViT) directly to the user's browser?

Today, we are diving deep into the world of Edge AI and WebGPU acceleration. We’ll build a "Dermatology Initial Screener" that runs entirely client-side. By leveraging WebLLM, TVM Unity, and Transformers.js, we can perform complex lesion analysis with zero data latency and 100% privacy.

If you are interested in local inference, privacy-first AI, and the future of WebGPU-powered applications, you're in the right place!


The Architecture: Privacy by Design

The goal is simple: The user's photo never leaves their device. We use the browser's GPU to do the heavy lifting that used to require a Python backend with a massive NVIDIA card.

graph TD
    A[User Image Input] --> B[HTML5 Canvas / Pre-processing]
    B --> C{WebGPU Support?}
    C -- Yes --> D[Transformers.js / WebLLM Engine]
    C -- No --> E[WASM Fallback/Error]
    D --> F[Local ViT Model / Vision-Language Model]
    F --> G[Classification & Reasoning]
    G --> H[Instant UI Feedback]
    style F fill:#f96,stroke:#333,stroke-width:2px
    style G fill:#bbf,stroke:#333,stroke-width:2px
Enter fullscreen mode Exit fullscreen mode

Tech Stack

  • WebGPU: The next-gen API for high-performance graphics and computation.
  • WebLLM: A high-performance in-browser LLM framework powered by TVM Unity.
  • Transformers.js: To run vision models (like ViT or MobileNet) natively in JS.
  • React/Vite: For a snappy frontend experience.

Step 1: Initializing the WebGPU Environment

Before we can run a model, we need to ensure the user's browser is ready for WebGPU. This is the secret sauce that makes in-browser AI run at near-native speeds.

async function initWebGPU() {
  if (!navigator.gpu) {
    throw new Error("WebGPU is not supported on this browser. Try Chrome Canary!");
  }
  const adapter = await navigator.gpu.requestAdapter();
  const device = await adapter.requestDevice();
  console.log("🚀 WebGPU is ready to roar!");
  return device;
}
Enter fullscreen mode Exit fullscreen mode

Step 2: Loading the Vision Transformer (ViT)

We’ll use Transformers.js to load a quantized version of a skin lesion classification model. By using a quantized model, we save on bandwidth while maintaining high accuracy.

import { pipeline } from '@xenova/transformers';

async function loadScreenerModel() {
  // We use a model fine-tuned on the HAM10000 dataset for skin lesions
  const classifier = await pipeline('image-classification', 'Xenova/vit-base-patch16-224', {
    device: 'webgpu', // Magic happens here!
  });
  return classifier;
}
Enter fullscreen mode Exit fullscreen mode

Step 3: Local Reasoning with WebLLM

While a ViT can classify an image, WebLLM (via TVM Unity) allows us to add a "reasoning" layer. We can feed the classification result into a local LLM to explain the findings in plain English—all without a server!

import * as webllm from "@mlc-ai/web-llm";

async function getLocalReasoning(prediction) {
  const engine = new webllm.MLCEngine();
  await engine.reload("Llama-3-8B-Instruct-v0.1-q4f16_1-MLC");

  const prompt = `A skin scan detected a ${prediction.label} with ${prediction.score * 100}% confidence. 
                  Provide a brief, non-diagnostic disclaimer and advice for a dermatologist visit.`;

  const reply = await engine.chat.completions.create({
    messages: [{ role: "user", content: prompt }]
  });
  return reply.choices[0].message.content;
}
Enter fullscreen mode Exit fullscreen mode

The "Official" Way to Build Edge AI

While building a prototype is fun, scaling local AI to production requires a deeper understanding of memory management and model optimization. For more production-ready examples and advanced patterns regarding Edge AI and private data processing, I highly recommend checking out the WellAlly Official Blog.

They provide excellent deep-dives into how to optimize TVM Unity pipelines for enterprise health applications, ensuring your local models are as lean as possible.


Step 4: Putting it All Together (The UI)

In your React component, you'd handle the image upload and trigger the pipeline.

const analyzeSkin = async (imageElement) => {
  setLoading(true);
  try {
    const classifier = await loadScreenerModel();
    const results = await classifier(imageElement.src);

    // Get the top result
    const topResult = results[0];

    // Get local LLM reasoning
    const advice = await getLocalReasoning(topResult);

    setReport({ analysis: topResult, advice });
  } catch (err) {
    console.error("Inference failed", err);
  } finally {
    setLoading(false);
  }
};
Enter fullscreen mode Exit fullscreen mode

Why This Matters (The "So What?")

  1. Zero Latency: No waiting for a 5MB high-res photo to upload to a server in Virginia.
  2. Privacy: Medical data is sensitive. Processing it on-device is the gold standard for HIPAA-compliant-ish user experiences.
  3. Offline Capability: This tool could work in remote areas with zero internet after the initial model download.

Conclusion

The browser is no longer just a document viewer; it's a powerful execution environment for Edge AI. By combining WebGPU, WebLLM, and Transformers.js, we can create life-changing tools that respect user privacy by default.

What do you think? Is the future of AI purely local, or will we always need the cloud for the "big" stuff? Let’s chat in the comments! 👇


Happy coding! If you enjoyed this "Learning in Public" journey, don't forget to ❤️ and bookmark! For more advanced AI architecture, visit wellally.tech/blog.

Top comments (0)