For a while I'd wanted to build a Swift Package that runs Meta's Segment Anything Model (SAM) on-device on iOS.
- Cut out the object you tap
- Cut out the object you box in
- Cut out the object you specify by text
Any of these segments instantly, with all inference completing on-device. It even comes with ready-to-use UI components.
So I built it.
GitHub: https://github.com/john-rocky/SamKit
What it can do
| Feature | Description |
|---|---|
| Point & Box | Tap for a point, drag for a box, then segment |
| Text Prompt | Type text like "dog" or "red cup" to detect and segment |
| Subject Lift | Long-press to lift an object out, Apple Photos–style; copy/save/share |
| Two backbones | MobileSAM (fast, 23MB) and SAM2 Tiny (accurate, 76MB) |
| Drop-in UI | Just embed the SwiftUI views as-is |
Architecture
SAMKit/
├── SAMKit # core inference engine (point/box)
├── SAMKitGrounding # text detection (YOLO-World + CLIP)
└── SAMKitUI # SwiftUI views (SamView / TextPromptView)
Split into three Swift Package products. Import only what you need.
Setup
1. Add the Swift Package
dependencies: [
.package(url: "https://github.com/john-rocky/SamKit.git", from: "1.0.0")
]
2. Download the models
Get the .mlpackage files from Releases and add them to your Xcode project.
| Model | Size | Use |
|---|---|---|
| MobileSAM | 23 MB | point/box segmentation (required) |
| SAM2 Tiny | 76 MB | higher-accuracy segmentation (optional) |
| Grounding (YOLO-World + CLIP) | 148 MB | text detection (optional) |
Usage
Point/box segmentation
The most basic use. Set an image and specify a point.
import SAMKit
// create a session (the model auto-loads from a bundled resource)
let session = try SamSession(
model: .bundled(.mobileSam),
config: .bestAvailable // priority: Neural Engine > GPU > CPU
)
// encode the image (once; later predicts use the cache)
try session.setImage(cgImage)
// segment by point
let result = try session.predict(
points: [SamPoint(x: 100, y: 200, label: .positive)]
)
// results
let mask = result.masks.first!
mask.cgImage // segmentation mask image
mask.score // IoU confidence score
mask.alpha // alpha-channel data
You can also specify negative points (regions to exclude) and a bounding box:
let result = try session.predict(
points: [
SamPoint(x: 100, y: 200, label: .positive), // point to include
SamPoint(x: 300, y: 400, label: .negative) // point to exclude
],
box: SamBox(x0: 50, y0: 50, x1: 400, y1: 400) // bounding box
)
Segment by text prompt
Combine SAM with text detection by YOLO-World + CLIP.
import SAMKit
import SAMKitGrounding
let session = try TextSegmentationSession(
groundingModel: .bundled(),
samModel: .bundled(.mobileSam)
)
try session.setImage(cgImage)
// search by text and segment
let result = try session.segment(query: "dog, cat")
result.detections // detections (bounding box + label)
result.masks // segmentation mask for each detection
result.scores // confidence scores
Cutting out the object
You can generate a transparent PNG from the segmentation result.
// cut out from a single mask
let extracted = result.masks[0].extractObject(from: cgImage)
// → a CGImage with a transparent background
// composite cut-out from multiple masks
let combined = SamMask.extractObject(from: cgImage, masks: result.masks)
Embedding the SwiftUI views
You don't need to build the UI yourself. SAMKitUI includes ready-to-use views.
import SAMKitUI
// interactive segmentation by point/box
SamView(image: uiImage, model: try .bundled(.mobileSam))
// segmentation by text search
TextPromptView(image: uiImage, session: textSession)
These views include:
- subject highlight after segmentation (dim background + subject at full brightness)
- an animated glowing outline
- long-press to lift the object → drag → Copy/Save/Share menu
How Subject Lift is implemented
A technical walkthrough of recreating Apple Photos' "lift the subject" feature.
1. Binarizing the mask
SAM's mask output is continuous sigmoid values, so convert it to a clean binary mask for display.
func binarizeMask(_ maskImage: CGImage) -> CGImage? {
// get pixel data via CGContext
let ctx = CGContext(data: nil, width: width, height: height, ...)
ctx.draw(maskImage, in: rect)
let pixels = ctx.data!.bindMemory(to: UInt8.self, capacity: width * height * 4)
let threshold: UInt8 = 128 // 50% — SAM's standard threshold
for i in 0..<(width * height) {
let o = i * 4
if pixels[o + 3] >= threshold {
// fully opaque white
pixels[o] = 255; pixels[o+1] = 255; pixels[o+2] = 255; pixels[o+3] = 255
} else {
// fully transparent
pixels[o] = 0; pixels[o+1] = 0; pixels[o+2] = 0; pixels[o+3] = 0
}
}
return ctx.makeImage()
}
At threshold 0 it picks up mask noise and cuts out most of the image. 128 (50%) is stable.
2. Generating the glowing outline
Extract the mask's contour with CGContext's shadow feature. Far faster than per-pixel dilation.
func generateOutline(from maskImage: CGImage) -> CGImage? {
// Step 1: turn the mask into a solid-white silhouette
ctx.draw(maskImage, in: rect)
ctx.setBlendMode(.sourceIn)
ctx.setFillColor(UIColor.white.cgColor)
ctx.fill(rect) // → white silhouette
// Step 2: draw with a shadow, then erase the interior → only the contour remains
outCtx.setShadow(offset: .zero, blur: glowRadius, color: UIColor.white.cgColor)
outCtx.draw(whiteSilhouette, in: rect) // shadow = the contour's glow
outCtx.setBlendMode(.destinationOut)
outCtx.draw(whiteSilhouette, in: rect) // erase the interior → contour only
}
Key points:
-
setShadowmakes the white glow (only two draws) -
.destinationOuterases the interior, leaving only the outer glow - Far faster than a dilation loop (O(thickness² × pixels))
3. Shimmer animation
Use TimelineView and AngularGradient to make light travel around the contour.
TimelineView(.animation(minimumInterval: 1.0 / 30)) { timeline in
let phase = timeline.date.timeIntervalSinceReferenceDate
.truncatingRemainder(dividingBy: 2.5) / 2.5 // one lap in 2.5s
ZStack {
// soft glow (blurred cyan)
outlineImage.colorMultiply(Color(red: 0.5, green: 0.85, blue: 1.0))
.blur(radius: 5).opacity(0.8)
// sharp outline
outlineImage.colorMultiply(.white)
// moving highlight
outlineImage.colorMultiply(.white)
.mask(
AngularGradient(
colors: [.white, .white.opacity(0.5), .clear, .clear, ...],
center: .center,
startAngle: .degrees(phase * 360),
endAngle: .degrees(phase * 360 + 360)
)
)
}
}
4. Unified gesture handler
Manage tap (add point), box drawing, and long-press lift all with a single DragGesture(minimumDistance: 0).
SwiftUI's onTapGesture + onLongPressGesture block each other, so I receive all touches in one gesture and classify them by time and movement.
DragGesture(minimumDistance: 0)
.onChanged { value in
// schedule a timer on the first touch
if gestureStartTime == nil {
gestureStartTime = Date()
// decide long-press after 0.3s
DispatchQueue.main.asyncAfter(deadline: .now() + 0.3) {
guard gestureStartTime != nil, !isLifted, hasVisibleMasks else { return }
let moved = hypot(lastTranslation.width, lastTranslation.height)
guard moved < 15 else { return } // if it moved, it's not a long-press
handleLiftObject() // start the lift!
}
}
if isLifted {
liftDragOffset = value.translation // follow the drag
}
}
.onEnded { value in
if isLifted {
// released → show menu
showLiftMenu = true
} else if elapsed < 0.3 && moved < 15 {
// quick touch → add point
addPoint(at: value.startLocation)
}
}
Classification logic:
| Condition | Verdict |
|---|---|
| < 0.3s, < 15pt moved | tap → add point |
| ≥ 0.3s, < 15pt moved | long-press → start lift |
| ≥ 10pt moved (box mode) | drag → draw box |
| movement while lifted | lift-drag → move object |
5. Subject highlight
Rather than overlaying a colored mask after segmentation, dim the background and show only the subject at its original brightness.
// darken the background
Color.black.opacity(0.25)
// show only the subject at the original image's brightness
Image(uiImage: image)
.mask(Image(uiImage: UIImage(cgImage: binaryMask)))
This makes the transition to long-press lift natural (the dimming just deepens 0.25 → 0.4).
Performance
- Image encoding: once per image; later predicts reuse the cache
- Inference: accelerated on Neural Engine / GPU (FP16)
- Outline generation: only two CGContext-shadow draws; no pixel loop
- Networking: none. Fully on-device
Summary
With SAMKit you can add segmentation to an iOS app in a few lines.
// an interactive segmentation UI in one line
SamView(image: uiImage, model: try .bundled(.mobileSam))
Experiences like Subject Lift are built in too, so you can bring an Apple Photos–like UX into your own app immediately.
GitHub: https://github.com/john-rocky/SamKit
Feedback and issues welcome!

Top comments (0)