I built a livestock counting app for smallholders. No internet required, no subscription, no server. You take a photo of your chickens, sheep, or cattle, and it counts them — entirely on-device. Here's how it actually works.
The Problem
Smallholders regularly need to count animals. In a field. In a barn. Where there's no signal. The apps that exist are either generic (bad accuracy for farm animals), require a server round-trip, or charge you monthly to count your own chickens. None of that made sense to me.
So I built Muster.
The Stack
- iOS 17, SwiftUI, SwiftData — no third-party dependencies, ships as a one-time purchase
- YOLOv8n — the nano variant, exported to CoreML format
- Apple's Vision framework — handles the ML request lifecycle, orientation correction, and bounding box coordinate normalisation
- Zero backend — no server, no account, no ongoing cost
The model is small enough to run on-device without breaking a sweat. YOLOv8n sits at about 6MB in CoreML format. On an iPhone 13 it processes a typical farm photo in under 400ms. That's fast enough that it feels instant.
How Inference Works
The VisionService wraps a VNCoreMLModel and fires a VNDetectRectanglesRequest against the input image. The key detail here is orientation: photos from iOS cameras carry EXIF orientation metadata, and if you don't account for it before passing frames to Vision, your bounding boxes are in the wrong coordinate space.
let ciImage = CIImage(image: uiImage)!
.oriented(forExifOrientation: imageOrientationToExifOrientation(uiImage.imageOrientation))
let handler = VNImageRequestHandler(ciImage: ciImage, options: [:])
After inference, each detection gets mapped to a DetectedObject with a normalised bounding box and confidence score. The UI overlays dot markers on the image — one per detection — and lets the user tap any to dismiss false positives before saving.
Preset Categories vs. Tap-to-Select
The tricky UX question was: how does the user tell the app what to count? I landed on two modes:
- Preset categories — bird/poultry, sheep, cattle, plants — each mapped to specific COCO class IDs. The detection filter is applied post-inference, so the model still runs once regardless.
- Tap-to-select — the user taps one example item in the photo, and the app counts all detections with the nearest matching class. Good for "other" categories the presets don't cover.
The confidence thresholds needed tuning. Out of the box, YOLOv8n is conservative — I loosened the threshold for the farming categories because the cost of missing a sheep is higher than the cost of an occasional false positive that the user can tap away.
The Proof-of-Count Card
The feature I shipped last was the shareable count card — a rendered image showing the annotated photo, count total, category, timestamp, and app branding. Smallholders sometimes need to show a headcount to a vet, insurer, or land agent. A screenshot is clunky. A clean card with metadata looks like a document.
This was a SwiftUI View rendered to UIGraphicsImageRenderer — no external libraries, no server-side rendering.
What I Learned
Running ML inference at the edge is surprisingly painless on modern Apple hardware. CoreML and Vision do the heavy lifting. The hard part isn't the inference — it's the UX around confidence thresholds, false positive handling, and giving users enough control without overwhelming them.
If you're building anything that involves counting, detecting, or classifying on-device: the YOLOv8n → CoreML pipeline is mature, well-documented, and genuinely fast enough for production use.
Muster is heading to the App Store soon. One-time purchase. No subscription. Count your flock. No signal needed.
Top comments (1)
Thanks for your article!Todd
I really enjoyed it. The lower confidence threshold makes sense for this use case. Missed sheep cost more than extra taps.
But what NMS strategy are you using for clustered animals? Default IoU merges overlapping detections into one.