On-Device AI for Construction Safety: Why I'm Skipping the Cloud Entirely

#ios #machinelearning #javascript #showdev

I've been building a construction safety inspection app — GroundCheck — and from day one I made a decision that surprised a few people: no cloud AI. Every hazard detection runs on-device, offline, in under 50 MB.

Here's why that wasn't just a cost call — it was an engineering call.

The problem with cloud AI on a construction site

Construction sites are not Silicon Valley offices. Mid-size commercial builds — the beachhead market I'm targeting — often have patchy LTE at best, and active floors can be dead zones. A safety inspector can't pause a walkthrough because their hazard detection app is waiting on a round-trip to an API.

When I looked at the alternatives:

Cloud vision APIs — fast to build, $0.001–0.003/image at scale, but useless offline and creates a real liability question around who holds footage of an active construction site
On-device ML — more upfront work, but deterministic latency, zero connectivity dependency, and no data leaves the device

For safety tooling, determinism matters. If an inspector gets a false negative because the API timed out, that's not a UX bug — it's potentially a serious incident.

The model stack: keeping it under 50 MB

The target is YOLOv8s + MobileNetV3, combined under 50 MB. Here's why each choice:

YOLOv8s — the small variant sits around 22 MB as a CoreML model. Fast enough to run on-device without throttling the camera feed. The 's' variant trades some mAP against the nano (which would be faster but misses smaller objects — a real problem when you're detecting things like exposed rebar or missing PPE at distance).

MobileNetV3 — classification backbone for finer-grained scene understanding. Once YOLOv8s has found a detection region, MobileNetV3 does the heavy lifting on "is this person wearing a hard hat" vs. "is this person holding a hard hat." Two-stage pipeline, both on-device.

The total bundle target is <50 MB because that's the threshold where App Store cellular auto-download kicks in. Installers and safety managers shouldn't have to think about it.

Offline-first all the way down

The inspection flow is built on Drizzle ORM over expo-sqlite. Everything captures locally first — photos, GPS coordinates, hazard detections, inspection notes. A sync queue handles Supabase replication when connectivity returns.

This means the app works identically with zero bars as it does on a solid connection. Photo uploads are deferred, sync state is visible in the UI, and no part of the core inspection loop has a network dependency.

It's more upfront architecture work. But for a product that's supposed to replace $600/month SafetyCulture contracts, "it doesn't work if you're underground" isn't acceptable.

What I've learned so far

On-device AI is actually quite accessible now. CoreML tooling has matured significantly. Converting a YOLOv8 model to CoreML is mostly straightforward via coremltools; the sharp edges are around input preprocessing and getting confidence thresholds tuned for your domain (construction hazard detection ≠ COCO defaults).

Offline-first is a discipline, not a feature. It touches every layer — schema design, UI state, sync conflict resolution. You can't bolt it on after the fact. I scaffolded the sync queue before I wrote a single inspection screen.

The 50 MB budget forces good decisions. Model quantization, INT8 where it matters, careful layer pruning. Constraints produce better models than unlimited compute budgets.

If you're building for any domain where connectivity isn't guaranteed — field work, logistics, healthcare, agriculture — the on-device AI stack has never been more viable. The cloud-first assumption deserves to be challenged.