Skincare as Code: Building a Privacy-First Skin Analysis App with MediaPipe and ONNX

#machinelearning #mediapipe #flutter #mobile

In an era where personal data is the new gold, healthcare privacy is paramount. Why send sensitive photos of skin conditions to a cloud server when you can process them directly on your phone? 🚀

Today, we're diving deep into On-device Machine Learning (ODML) to build a "Skincare as Code" solution. We will implement a real-time skin lesion analysis and tracking system using a lightweight Vision Transformer (ViT), deployed via ONNX Runtime and MediaPipe within a Flutter ecosystem. This approach leverages Mobile Edge Computing and Privacy-preserving AI to ensure that your health data never leaves your device.

If you are interested in exploring more production-ready patterns and advanced architectural insights for AI-driven health tech, be sure to visit the WellAlly Blog, which served as a major inspiration for this edge-first implementation.

The Architecture: From Camera to Inference

Our pipeline is designed for high performance and low latency. We use MediaPipe for landmark detection to define the "Region of Interest" (ROI) and then pass that specific crop to a quantized Vision Transformer model.

graph TD
    A[Camera Feed - Flutter] --> B{MediaPipe Landmarks}
    B -->|Face/Body Mesh| C[ROI Extraction & Normalization]
    C --> D[ONNX Runtime Inference]
    D --> E[Vision Transformer ViT Model]
    E --> F[Classification & Anomaly Heatmap]
    F --> G[On-Device SQLite History]
    G --> H[Flutter UI Dashboard]
    style E fill:#f96,stroke:#333,stroke-width:2px
    style D fill:#69f,stroke:#333,stroke-width:2px

Prerequisites

To follow this advanced tutorial, you'll need:

Flutter SDK: For cross-platform UI development.
MediaPipe: For real-time landmarking.
ONNX Runtime (Mobile): For high-performance inference.
Python: To export and quantize your pre-trained ViT model.

Step 1: Optimizing the Vision Transformer (ViT)

Standard ViT models (like google/vit-base-patch16-224) are too heavy for mobile. We use a MobileViT or a distilled tiny-ViT. After training on your custom skin dataset (e.g., ISIC), you must export it to ONNX format and apply 8-bit quantization to fit the mobile memory constraints.

import torch
import onnx
from onnxruntime.quantization import quantize_dynamic, QuantType

# Load your fine-tuned model
model = torch.load("skin_vit_model.pt")
dummy_input = torch.randn(1, 3, 224, 224)

# Export to ONNX
torch.onnx.export(model, dummy_input, "skin_analysis.onnx", 
                  opset_version=12, 
                  input_names=['input'], 
                  output_names=['output'])

# Apply Dynamic Quantization for Edge Deployment
quantize_dynamic("skin_analysis.onnx", 
                 "skin_analysis_quant.onnx", 
                 weight_type=QuantType.QUInt8)
print("✅ Model optimized for mobile!")

Step 2: Integrating MediaPipe for ROI Extraction

To prevent the model from analyzing background noise, we use MediaPipe Face Mesh or Selfie Segmentation to identify the skin area. In Flutter, we can use the google_mlkit_face_detection or a custom MediaPipe C++ wrapper.

// Flutter snippet for extracting ROI using MediaPipe landmarks
Rect calculateROI(List<FaceLandmark> landmarks) {
  double minX = double.infinity, minY = double.infinity;
  double maxX = 0, maxY = 0;

  for (var landmark in landmarks) {
    if (landmark.position.x < minX) minX = landmark.position.x;
    if (landmark.position.y < minY) minY = landmark.position.y;
    if (landmark.position.x > maxX) maxX = landmark.position.x;
    if (landmark.position.y > maxY) maxY = landmark.position.y;
  }

  // Return a square crop for the ViT model
  return Rect.fromLTRB(minX, minY, maxX, maxY);
}

Step 3: On-Device Inference with ONNX Runtime

Now we load the quantized model into our Flutter app. On iOS, ONNX Runtime will automatically leverage CoreML via the CoreMLExecutionProvider for hardware acceleration.

import 'package:onnxruntime/onnxruntime.dart';

Future<void> runInference(Uint8List imageBytes) async {
  final sessionOptions = OrtSessionOptions();

  // Enable CoreML for iOS or NNAPI for Android
  sessionOptions.appendExecutionProviderCustom({"CoreML": {}}); 

  final session = OrtSession.fromFile(
    File("assets/models/skin_analysis_quant.onnx"), 
    sessionOptions
  );

  // Pre-process image to 224x224 and normalize
  final inputOrt = OrtValueTensor.createTensorListFromFloat32List(
    processImage(imageBytes)
  );

  final inputs = {'input': inputOrt};
  final runOptions = OrtRunOptions();

  // High-speed inference! 🏎️
  final outputs = await session.run(runOptions, inputs);

  processResults(outputs[0]?.value);

  inputOrt.release();
  session.release();
}

Advanced Tip: Recovery Tracking

To implement the "comparison" feature, we don't just store the diagnosis; we store the embeddings (the vector from the second-to-last layer of the ViT).

By calculating the Cosine Similarity between today's scan and last week's scan, we can mathematically track if a lesion is shrinking or changing in texture. This is far more accurate than simple visual comparison!

Looking for more? For a deep dive into vector embeddings for medical imaging and how to scale this architecture to millions of devices, check out the specialized tutorials at WellAlly Tech Blog.

Conclusion 🥑

Building "Skincare as Code" isn't just about the AI model; it's about the orchestration of Edge AI, Mobile UI, and Privacy. By combining MediaPipe's precision with the portability of ONNX, we’ve created a tool that empowers users without compromising their data.

Key Takeaways:

Quantization is non-negotiable for ViT models on mobile.
MediaPipe acts as the perfect "gatekeeper" for ROI extraction.
CoreML/NNAPI integration via ONNX Runtime provides the necessary 60FPS performance.

What are you building next on the edge? Drop a comment below! 👇