Beck_Moulton

Posted on Jun 16

Revolutionizing Dermatology: Building an Offline Skin Lesion Segmenter with Med-SAM and CoreML

#ai #webdev #devops #security

In the world of digital health, the gap between "research-grade AI" and "production-ready mobile apps" is often a chasm. When it comes to dermatology AI, precision is non-negotiable. Identifying a suspicious mole or a flare-up of dermatitis requires more than just a bounding box; it requires pixel-perfect edge detection.

Today, we are diving deep into Skin Lesion Screening Engineering. We will explore how to bridge the gap by taking the powerful Med-SAM (Medical Segment Anything Model) and optimizing it for real-time, offline performance on mobile devices using CoreML and ONNX Runtime. By implementing mobile image segmentation and edge-side inference, we ensure user privacy while maintaining high-fidelity diagnostic assistance.

The Architecture: From High-Res Images to Precise Masks

Scaling a transformer-based model like Med-SAM for a mobile environment requires a strategic pipeline. We can't just throw a 1GB model at a smartphone and expect 60FPS. We need a hybrid approach: distillation, quantization, and efficient hardware acceleration.

graph TD
    A[User Camera / Gallery] -->|Raw Image| B[Preprocessing & Resizing]
    B --> C{Inference Engine}
    C -->|iOS/Neural Engine| D[CoreML Model]
    C -->|Cross-Platform| E[ONNX Runtime Mobile]
    D --> F[Med-SAM Encoder/Decoder]
    E --> F
    F -->|Logits| G[Post-processing & Thresholding]
    G -->|Segmentation Mask| H[UI Overlay: React Native]
    H --> I[Final Assessment & Heatmap]

Prerequisites 🛠️

To follow this advanced guide, you should be comfortable with:

Tech Stack: Med-SAM (Python/PyTorch), CoreML, ONNX, and React Native.
Hardware: A Mac for CoreML conversion and a physical device (iOS/Android) for testing.

Step 1: Optimizing Med-SAM for the Edge

Med-SAM is a specialized variant of Meta’s Segment Anything Model, fine-tuned on massive medical datasets. However, the standard ViT-H (Vision Transformer) backbone is too heavy for mobile. We use a Mobile-SAM architecture but initialize it with Med-SAM's medical weights through knowledge distillation.

Model Export to ONNX / CoreML

First, we need to convert the PyTorch weights. Using coremltools, we can target the Apple Neural Engine (ANE).

import torch
import coremltools as ct
from med_sam import medsam_model_registry

# 1. Load the fine-tuned Med-SAM model
model = medsam_model_registry["vit_b"](checkpoint="medsam_vit_b.pth")
model.eval()

# 2. Trace the model with a dummy input
example_input = torch.rand(1, 3, 1024, 1024) 
traced_model = torch.jit.trace(model, example_input)

# 3. Convert to CoreML (Optimize for ANE)
mlmodel = ct.convert(
    traced_model,
    inputs=[ct.TensorType(shape=example_input.shape)],
    compute_units=ct.ComputeUnit.ALL, # Utilize GPU and Neural Engine
    minimum_deployment_target=ct.target.iOS16
)

mlmodel.save("SkinMedSAM.mlpackage")

Step 2: Integrating with React Native

While the "brain" of our app is the model, the "body" is React Native. For high-performance vision tasks, we use react-native-vision-camera combined with a custom frame processor or the ONNX Runtime React Native library.

Inference Logic (TypeScript)

import { OrtClient, Tensor } from 'onnxruntime-react-native';

const runSegmentation = async (imagePath: string) => {
  try {
    // Load the model
    const session = await OrtClient.create('skin_medsam_quantized.onnx');

    // Convert image to Tensor (Logic for resizing to 1024x1024)
    const inputTensor = await preprocessImageToTensor(imagePath);

    // Run Inference
    const outputs = await session.run({ "input": inputTensor });
    const maskData = outputs["output"].data;

    // Process mask to display on UI
    return maskData;
  } catch (e) {
    console.error("Inference failed", e);
  }
};

The "Official" Way: Production-Ready Patterns 🥑

Building a prototype is easy, but making it "medical-grade" requires rigorous attention to lighting conditions, skin tone diversity, and latency.

For advanced architectural patterns, such as implementing streaming inference or federated learning to improve your skin lesion models without compromising user data, I highly recommend checking out the technical deep-dives at WellAlly Blog. They provide extensive resources on productionizing vision models and navigating the complexities of healthcare AI engineering.

Step 3: Handling the Output (Visualizing Results)

Once we get the segmentation mask, we need to overlay it on the camera feed. This is where we calculate the area of the lesion and its regularity—key features for early screening.

import { Canvas, Path, Skia } from "@shopify/react-native-skia";

// Drawing the mask over the lesion
const LesionMask = ({ points }) => {
  const path = Skia.Path.Make();
  // Simplified logic to convert model output to SVG path
  path.moveTo(points[0].x, points[0].y);
  points.forEach(p => path.lineTo(p.x, p.y));
  path.close();

  return (
    <Canvas style={{ flex: 1 }}>
      <Path path={path} color="rgba(255, 0, 0, 0.4)" style="fill" />
    </Canvas>
  );
};

Conclusion 🏁

Engineering a skin lesion screening tool is a perfect example of "Learning in Public." By combining Med-SAM's specialized medical knowledge with CoreML's hardware acceleration, we can create life-saving tools that live right in our pockets.

Key Takeaways:

Domain Matters: Standard SAM fails on subtle medical textures; always use Med-SAM or fine-tuned variants.
Quantization is King: Moving from Float32 to Int8/Float16 is essential for keeping the device cool and the UI responsive.
Privacy First: Offline inference with ONNX/CoreML ensures that sensitive medical images never leave the user's device.

Have you tried deploying Vision Transformers to mobile? What was your biggest hurdle? Let's discuss in the comments! 👇

If you're looking for more production-ready examples and deep architectural insights, don't forget to visit wellally.tech/blog for the latest in AI and Healthcare engineering. 💻✨

DEV Community