What is YOLO26
The latest object detection model, released by Ultralytics in January 2026. Compared to the previous-generation YOLO11, CPU inference is up to 43% faster, greatly improving its practicality on edge devices.
Its biggest feature is end-to-end inference with no NMS. The Non-Maximum Suppression (NMS) post-processing step that used to be mandatory in YOLO is gone — the model outputs the final detections directly.
| Model | mAP | CPU inference | Params |
|---|---|---|---|
| YOLO26n | 40.9 | 38.9ms | 2.5M |
| YOLO26s | 48.6 | 63.3ms | 9.2M |
| YOLO26m | 53.1 | 155ms | 18.7M |
Why run it on iPhone
- Real-time inference: 30+ FPS using the Neural Engine
- Privacy: data never leaves the device
- Offline: works without a network
- Low latency: no server round-trip, so results come back instantly
Preparing the CoreML model
Option 1: Download a converted model
You can grab a converted model from the CoreML-Models repository.
Option 2: Convert it yourself
pip install ultralytics coremltools==8.1
python -c "
from ultralytics import YOLO
model = YOLO('yolo26s.pt')
model.export(format='coreml', nms=False)
"
Note: coremltools 9.0 has a
_castbug, so 8.1 is recommended.
Implementing the iOS app
Loading the model
import CoreML
import Vision
let config = MLModelConfiguration()
config.computeUnits = .all // Neural Engine + GPU + CPU
let mlModel = try MLModel(contentsOf: modelURL, configuration: config)
let vnModel = try VNCoreMLModel(for: mlModel)
Running inference on camera frames
func captureOutput(_ output: AVCaptureOutput,
didOutput sampleBuffer: CMSampleBuffer,
from connection: AVCaptureConnection) {
guard let pixelBuffer = CMSampleBufferGetImageBuffer(sampleBuffer) else { return }
let request = VNCoreMLRequest(model: vnModel) { request, _ in
self.handleDetections(request)
}
request.imageCropAndScaleOption = .scaleFill
try? VNImageRequestHandler(cvPixelBuffer: pixelBuffer, orientation: .up)
.perform([request])
}
Decoding the NMS-free output
YOLO26's output is a [1, 300, 6] tensor. Each row is [x1, y1, x2, y2, confidence, class_id] — already-filtered final results.
func handleDetections(_ request: VNRequest) {
guard let results = request.results as? [VNCoreMLFeatureValueObservation],
let array = results.first?.featureValue.multiArrayValue else { return }
let shape = array.shape.map { $0.intValue } // [1, 300, 6]
for i in 0..<shape[1] {
let confidence = array[[0, i, 4] as [NSNumber]].floatValue
guard confidence >= 0.25 else { continue }
let x1 = CGFloat(array[[0, i, 0] as [NSNumber]].floatValue) / 640
let y1 = CGFloat(array[[0, i, 1] as [NSNumber]].floatValue) / 640
let x2 = CGFloat(array[[0, i, 2] as [NSNumber]].floatValue) / 640
let y2 = CGFloat(array[[0, i, 3] as [NSNumber]].floatValue) / 640
let classId = Int(array[[0, i, 5] as [NSNumber]].floatValue)
// x1,y1,x2,y2 are normalized coordinates [0,1]
// convert them directly to screen coordinates and draw
}
}
With conventional YOLO (v5, v8, v9, v11), an NMS step was required here. In YOLO26, duplicate removal via Dual Assignment is already done inside the model, so you get the final results by just filtering on a threshold.
Comparison with NMS-based YOLO
| YOLO26 (no NMS) | YOLO11 (NMS required) | |
|---|---|---|
| Output |
[1, 300, 6] — direct results |
[1, 84, 8400] — needs decode + NMS |
| Post-processing | threshold filter only | box decode → NMS → filter |
| CoreML conversion | simple, nms=False
|
needs a pipeline, nms=True
|
| Inference speed | 43% faster (CPU) | baseline |
Drawing bounding boxes
Use CAShapeLayer for fast drawing. Drawing with SwiftUI's ForEach regenerates the views every frame and gets slow.
class BoundingBoxView {
let shapeLayer = CAShapeLayer()
let textLayer = CATextLayer()
func show(frame: CGRect, label: String, color: UIColor) {
CATransaction.begin()
CATransaction.setDisableActions(true) // disable implicit animation
shapeLayer.path = UIBezierPath(roundedRect: frame, cornerRadius: 10).cgPath
shapeLayer.strokeColor = color.cgColor
textLayer.string = label
CATransaction.commit()
}
}
Key points:
- Disable implicit animation with
CATransaction.setDisableActions(true). Without it, labels lag one frame behind the box. - Pool and reuse ~100 layers to avoid per-frame alloc/dealloc.
Aligning coordinates with the camera preview
This is where you get stuck the most.
// Set the camera output's videoOrientation to .portrait
// so the pixelBuffer arrives already rotated to portrait
let connection = videoOutput.connection(with: .video)
connection?.videoOrientation = .portrait
// Pass .up to VNImageRequestHandler (it's already rotated)
VNImageRequestHandler(cvPixelBuffer: pb, orientation: .up)
Because the preview is cropped with resizeAspectFill, you have to correct for the difference between the camera's aspect ratio and the screen's aspect ratio.
let cameraRatio = shortSide / longSide // e.g., 1080/1920
let displayRatio = screenWidth / screenHeight
let ratio = (screenHeight / screenWidth) / (longSide / shortSide)
if ratio >= 1 {
// screen is taller than the camera → scale-correct horizontally
let offset = (1 - ratio) * (0.5 - rect.minX)
// ... correct with an affine transform
}
Sample app
There's a complete sample app in the CoreML-Models repository.
-
YOLO26Demo (
sample_apps/YOLO26Demo/) — for NMS-free models- Real-time camera inference + FPS/latency display
- Inference on images from the photo library
- Per-frame inference on video
Setup:
- Download and unzip the model
- Drag the
.mlpackageinto your Xcode project - Build & run on a real device
Any model with output shape [1, N, 6] is loaded automatically, regardless of file name.
Conversion tips
What I learned doing this conversion:
-
coremltools 9.0 + numpy 2.x crashes on
_cast→ use coremltools 8.1 + numpy<2 -
ultralytics 8.4.31's
nms=TrueCoreML export fails because of apipeline_coremlbug → with NMS-free YOLO26 you just usenms=False, so it's a non-issue - Python 3.14 isn't supported by coremltools → use Python 3.12
Summary
Thanks to its NMS-free design, YOLO26 makes both CoreML conversion and app implementation simpler. Conventional YOLO needed an NMS pipeline and decoding logic; YOLO26 needs only threshold filtering.
With the iPhone's Neural Engine you can hit real-time detection at 30+ FPS. Edge AI feels one step closer to being practical.
References
Originally published in Japanese on Qiita. Want to prototype an app or service with the latest AI, fast? Reach out: rockyshikoku@gmail.com — GitHub / X
Top comments (0)