Distribution of firmware updates to thousands of GPS trackers and BLE beacons installed in warehouses, vehicles, and installations is a very difficult aspect of operations in IoT solutions. Do it wrong, and you brick the devices in the field. Do it right, and you deliver new features, security patches, and other improvements to all your fleet while they continue working normally. This is how it works in detail.
Why OTA updates are mandatory once you have a fleet
Once you deploy several dozen of IoT devices in the field, manual updates become impossible from an operational perspective. Imagine manually updating a fleet of 500 GPS trackers located in 10 different sites in the field β it will take you weeks. During those weeks your fleet will be running with different versions of firmware, exhibiting inconsistent behavior and being vulnerable to security threats.
OTA updates solve this problem by sending a firmware update to the device over the air through its connection. This could be either MQTT, cellular connection, WiFi or even LoRaWAN connections.
OTA update process β Four steps for safety
π¦Package
Create, sign, and version the firmware binary. Distribute over CDN with a checksum.
π‘Notify
Inform the device about availability of a newer version using MQTT protocol. The device verifies its eligibility to receive an update.
β¬οΈDownload and verify
Downloads the binary in chunks and verifies the SHA-256 hash and digital signature.
πApply and rollback
Applies the update by flashing the firmware binary to the inactive partition. Boots to the new firmware version. Performs a rollback if health-check fails.
Dual-partition design (A/B)
The safest OTA update pattern includes two partitions, one which contains the active version of firmware while the other is used to store the new firmware version. This is referred to as the inactive partition.
Step 1
Device runs from Partition A
Normal operation. Partition B is empty or holds the previous version.
Step 2
OTA notification received
Device downloads new firmware into Partition B in chunks, verifies hash on completion.
Step 3
Reboot into Partition B
Bootloader switches boot target to Partition B. Device boots new firmware.
Step 4a
Health check passes β commit
Device reports successful boot. Partition B becomes the new active partition.
Step 4b
Health check fails β rollback
Bootloader detects failed boot attempts. Reverts to Partition A automatically. Device stays online.
Step 1 - Back End: Server and notify
OTA back end will do two things: server and notify the eligible devices. Notify using MQTT: devices have a persistent connection anyways:
// Node.js OTA notification β push to eligible devices via MQTT
async function pushOTANotification(firmwareVersion, targetFleet) {
const payload = {
version: firmwareVersion,
url: `https://cdn.yourdomain.com/firmware/${firmwareVersion}.bin`,
sha256: await getFileHash(firmwareVersion),
size: await getFileSize(firmwareVersion),
releaseNotes: 'Fix GPS drift bug + improved cold start'
}
// Publish to fleet topic β only devices in targetFleet receive it
await mqttClient.publish(
`fleet/${targetFleet}/ota/available`,
JSON.stringify(payload),
{ qos: 1, retain: true } // retain so offline devices get it on reconnect
)
}
Step 2 - Device: Receive & Download
At the device level, subscribing to OTA topic and performing updates on Python (Raspberry Pi tracker):
import hashlib, requests, json
import paho.mqtt.client as mqtt
CURRENT_VERSION = '1.4.2'
FIRMWARE_PATH = '/firmware/update.bin'
def on_ota_message(client, userdata, msg):
update = json.loads(msg.payload)
# Skip if already on this version
if update['version'] == CURRENT_VERSION:
return
# Download firmware in chunks
print(f"Downloading firmware {update['version']}...")
r = requests.get(update['url'], stream=True)
with open(FIRMWARE_PATH, 'wb') as f:
for chunk in r.iter_content(4096):
f.write(chunk)
# Verify SHA-256 hash before applying
if not verify_hash(FIRMWARE_PATH, update['sha256']):
report_failure(client, 'HASH_MISMATCH')
return
# Hash verified β apply and reboot
apply_firmware()
def verify_hash(path, expected):
sha256 = hashlib.sha256()
with open(path, 'rb') as f:
for chunk in iter(lambda: f.read(4096), b''):
sha256.update(chunk)
return sha256.hexdigest() == expected
Step 3 - Sign firmware for security purposes
Hash verification for the purpose of corruption detection and signature verification for preventing malicious updates. Only your signed firmware can be updated:
# Build pipeline β sign firmware binary before uploading
# Generate signing key (once, store private key securely in CI secrets)
openssl genrsa -out firmware_signing.key 4096
openssl rsa -in firmware_signing.key -pubout -out firmware_signing.pub
# Sign the firmware binary
openssl dgst -sha256 -sign firmware_signing.key \
-out firmware_v1.5.0.bin.sig firmware_v1.5.0.bin
# On device β verify signature before applying
openssl dgst -sha256 -verify firmware_signing.pub \
-signature firmware_v1.5.0.bin.sig firmware_v1.5.0.bin
Step 4 - Health checks and Rollback mechanism
Once the device reboots into the new firmware, it needs to send out a successful health check message within the time limit. Otherwise, the bootloader will roll back to its old state, in case the new firmware doesn't work (causes crashes, can't connect):
// Post-boot health check β report to backend within 60 seconds
async function postBootHealthCheck() {
const checks = {
mqttConnected: await checkMQTTConnection(),
gpsLock: await checkGPSLock(),
sensorReadings: await checkSensors(),
memoryOk: process.memoryUsage().heapUsed < 50_000_000
}
const passed = Object.values(checks).every(Boolean)
// Report result to backend
mqttClient.publish(`devices/${DEVICE_ID}/ota/status`, JSON.stringify({
version: CURRENT_VERSION,
status: passed ? 'SUCCESS' : 'FAILED',
checks,
ts: new Date().toISOString()
}))
// If failed β trigger rollback to previous partition
if (!passed) triggerRollback()
}
Never do OTAs across all your fleet simultaneously; otherwise, if something goes wrong and a whole 10,000 devices get bricked, it's an unrecoverable scenario. Use staged rollouts from 1% β 10% β 50% β 100%, confirming health check after each step.
Staged rollout strategy
// Staged rollout controller
const ROLLOUT_STAGES = [
{ percent: 1, waitHours: 2, label: 'canary' },
{ percent: 10, waitHours: 12, label: 'early' },
{ percent: 50, waitHours: 24, label: 'majority' },
{ percent: 100,waitHours: 0, label: 'full' }
]
async function runStagedRollout(version, fleet) {
for (const stage of ROLLOUT_STAGES) {
const devices = sampleFleet(fleet, stage.percent)
await pushOTAToDevices(devices, version)
// Wait and check success rate before next stage
await sleep(stage.waitHours * 3600000)
const successRate = await getSuccessRate(version, devices)
if (successRate < 0.95) {
// Less than 95% success β halt and investigate
await haltRollout(version)
throw new Error(`Rollout halted at ${stage.label} β success rate: ${successRate}`)
}
}
}
Scheduling Tip: Schedule OTA pushes in a known period of low activity for your fleet - nighttime for warehouses devices, off shift time for manufacturing. Do not schedule pushes at peak operational time since rebooting cycles costs are much higher then.
OTA Safety Checklist
β Dual partition (A/B) - update image flashed to inactive partition, rollback is possible at any moment
β SHA-256 hash check - do verification before flashing, reject corrupt images
β Cryptographic signature - use RSA/ECDSA and keep secret keys in CI system secrets only
β Auto-rollback on health check failed in timeout
β Staged deployment: canary -> 10% -> 50% -> 100% with success rates controls
β MQTT message retain - device receives push even if disconnects during OTA push
β Version pinning - the ability to push update for particular device or fleet by specified version number
β Update logs - keep record of update process for every single device firmware
β NEVER push update to whole fleet at once - one bad update can brick thousands of units
β NEVER ignore hash verification - data integrity problems during download happens way more often than you think on cellular
Recommended stack
- MQTT (Mosquitto)
- AWS S3 / Cloudflare R2
- Node.js backend
- Python (device)
- OpenSSL (signing)
- PostgreSQL (audit log)
When it comes to large-scale OTA that is managed for you, AWS IoT Jobs or Azure IoT Hub take care of all the orchestration of the OTA process. If youβre hosting your own systems, the above combination is sufficient to support OTA to thousands of devices.
Our own IoT device fleet uses the dual-partition OTA model, rolling out to our GPS and Bluetooth Low Energy tracking devices using a fully automated process where we donβt have to touch any of the devices in the field manually. Check out our platform β
Are you working with a device fleet of IoT trackers? Then AssetTrackPro will provide you with OTA functionality, device management, and firmware updates. Discover AssetTrackPro β
Top comments (0)