Simple Memo

Posted on May 26

An offline-first Outbox in Swift: 7 steps, no third-party libs

#swift #iosdev #mobile #tutorial

Reproducing this in your own iOS project: seven steps.

I have been running this Outbox in my note-to-email app for ten months. It survives a four-floor subway descent, a phone reboot mid-send, and the occasional iCloud Drive hang. It is roughly 240 lines of Swift, with zero third-party packages.

Below is the recipe. I will write the code straight, the way it sits in my repo, and call out the failure modes I hit at each step. If you are building anything that needs to "send-and-forget" (analytics, message drafts, telemetry, optimistic UI mutations), most of this will transplant directly.

What an Outbox actually is

An Outbox is a durable queue that sits between your UI and the network. The UI hands it an Operation. The Outbox guarantees that Operation will be executed at least once, eventually, even if the user puts the phone in airplane mode, force-quits the app, and reopens it on a different cellular network six hours later.

A short list of requirements drove every decision I made:

The user's tap must feel instant. Network latency is the Outbox's problem, not the UI's.
A killed app, a rebooted phone, or a low-memory eviction must not lose work.
I do not want to ship 4 MB of dependencies for a feature this conceptually small.

Everything below follows from those three.

Step 1: Define a single envelope

Everything that enters the Outbox is wrapped in one envelope. Plain Codable struct, all the bookkeeping a retry loop needs, no payload coupling.

struct OutboxOperation: Codable, Identifiable {
    let id: UUID
    let kind: String          // "sendEmail", "uploadAnalytics", etc.
    let payload: Data         // opaque to the Outbox
    let createdAt: Date
    var attemptCount: Int     // mutates on retry
    var nextEligibleAt: Date  // backoff target
    var lastError: String?    // diagnostic only
}

I keep payload as opaque Data. The Outbox never deserialises it. Each registered OperationHandler knows how to decode its own payload type. This decoupling means I can add a new operation kind without touching the queue.

Failure I hit. My first version used a Swift enum with associated values for kind. It read beautifully and broke the first time I added a case in an app update. Every envelope written by the previous build failed to decode on launch, and I lost three hours of users' queued sends. A String kind plus opaque Data payload is uglier and forwards-compatible.

Step 2: Persist each operation as its own file

I store the queue as a directory of JSON files inside the app's Application Support folder.

final class OutboxStore {
    private let dir: URL
    private let encoder = JSONEncoder()
    private let decoder = JSONDecoder()
    private let lock = NSLock()

    init() throws {
        let base = try FileManager.default.url(
            for: .applicationSupportDirectory,
            in: .userDomainMask,
            appropriateFor: nil,
            create: true
        )
        dir = base.appendingPathComponent("Outbox", isDirectory: true)
        try FileManager.default.createDirectory(
            at: dir, withIntermediateDirectories: true
        )
    }

    func enqueue(_ op: OutboxOperation) throws {
        lock.lock(); defer { lock.unlock() }
        let url = dir.appendingPathComponent("\(op.id.uuidString).json")
        let data = try encoder.encode(op)
        try data.write(to: url, options: [.atomic])
    }

    func all() throws -> [OutboxOperation] {
        lock.lock(); defer { lock.unlock() }
        let urls = try FileManager.default.contentsOfDirectory(
            at: dir, includingPropertiesForKeys: nil
        )
        return urls.compactMap {
            try? decoder.decode(OutboxOperation.self,
                                from: Data(contentsOf: $0))
        }.sorted { $0.createdAt < $1.createdAt }
    }

    func remove(_ id: UUID) throws {
        lock.lock(); defer { lock.unlock() }
        try? FileManager.default.removeItem(
            at: dir.appendingPathComponent("\(id.uuidString).json")
        )
    }

    func update(_ op: OutboxOperation) throws {
        try enqueue(op) // same file path; atomic overwrite
    }
}

Why one file per operation, not one big queue file? Two reasons. The first is crash safety. An atomic write of a 1 KB envelope finishes in microseconds. An atomic rewrite of a 10 MB queue file is a much wider window for the OS to kill you mid-write. The second is the iOS memory eviction model. With one file per op, only the envelopes the drain loop is currently reading sit in memory.

Failure I hit. I forgot the .atomic write option in the first version. A backgrounded app got killed mid-write to one envelope, which left a half-written JSON on disk. On next launch, JSONDecoder threw on that one file. try? swallowed the error and silently dropped the operation. The user lost a memo. The fix was both .atomic and never try? a decode without surfacing the failure to a diagnostic log.

Step 3: Watch the network with NWPathMonitor

Apple gives you this for free. There is no reason to install a reachability library in 2026.

import Network

final class NetworkWatcher {
    private let monitor = NWPathMonitor()
    private let queue = DispatchQueue(label: "outbox.network")
    private(set) var isOnline = false
    var onChange: ((Bool) -> Void)?

    func start() {
        monitor.pathUpdateHandler = { [weak self] path in
            let online = path.status == .satisfied
            guard let self, self.isOnline != online else { return }
            self.isOnline = online
            self.onChange?(online)
        }
        monitor.start(queue: queue)
    }
}

Two opinions worth defending here. First, I treat "connected to a network" as "online" and never as "can reach my server". The reachability of a specific endpoint is a question I let the drain loop answer the hard way, by trying. Second, I do not debounce. If the user flips from Wi-Fi to cellular twice in a second, the drain loop will fire twice and the deduplication in Step 6 makes that safe.

Failure I hit. My early version queried path.isExpensive and refused to drain on cellular. I thought I was being polite to the user's data plan. Then I noticed that the only feature in my app using the Outbox is the user's own action of sending their own note. They very much want it to go even on LTE. Letting the user's explicit intent override a cost heuristic was the right call.

Step 4: Drain with bounded concurrency

The drain loop wakes on three triggers: a new enqueue, the network coming back, and a scheduled retry timer firing. It pulls eligible operations and runs them through their handlers.

actor OutboxDrainer {
    private let store: OutboxStore
    private let handlers: [String: OperationHandler]
    private let maxInFlight: Int = 3

    init(store: OutboxStore, handlers: [String: OperationHandler]) {
        self.store = store
        self.handlers = handlers
    }

    func drain() async {
        guard let ops = try? store.all() else { return }
        let now = Date()
        let eligible = ops.filter { $0.nextEligibleAt <= now }

        try? await withThrowingTaskGroup(of: Void.self) { group in
            var slots = 0
            for op in eligible {
                if slots >= maxInFlight {
                    try await group.next()
                    slots -= 1
                }
                group.addTask { [self] in
                    await execute(op)
                }
                slots += 1
            }
            for try await _ in group { }
        }
    }

    private func execute(_ op: OutboxOperation) async {
        guard let handler = handlers[op.kind] else { return }
        do {
            try await handler.run(op.payload)
            try? store.remove(op.id)
        } catch {
            try? backoff(op, error: error)
        }
    }
}

I cap concurrency at three. Higher numbers used to win me 200–300 ms on initial drains of large queues, then I noticed those wins evaporated under any real radio condition. Three is enough to mask the latency of one slow request without saturating the cellular link.

Failure I hit. My first drainer was not an actor. It was a class wrapping an OperationQueue. Two concurrent triggers (network-up plus a new enqueue arriving in the same 50 ms window) would each schedule a drain, and the same operation would execute twice. Making the drainer an actor serialises drain calls automatically. The actor reentrancy debates aside, this is one of the cleanest wins Swift Concurrency gave me.

Step 5: Exponential backoff with a hard ceiling

The retry policy is a single function. It mutates the envelope, persists it, and lets the next drain pick it up.

private func backoff(_ op: OutboxOperation, error: Error) throws {
    var updated = op
    updated.attemptCount += 1
    let base = min(600.0, pow(2.0, Double(updated.attemptCount)))
    let jitter = Double.random(in: 0..<(0.3 * base))
    updated.nextEligibleAt = Date().addingTimeInterval(base + jitter)
    updated.lastError = String(describing: error)
    try store.update(updated)
}

Doubling with a 10-minute ceiling means a request that has been failing for an hour will only be retried every ten minutes after that point. I deliberately do not give up. There is no "drop after N failures" policy because in my app every queued operation is something the user explicitly typed and pressed send on. The right time to give up is when the user clears the queue manually from a settings screen.

Failure I hit. My first backoff used a fixed 30-second retry. The first time my server had a 90-minute outage, every device in the wild was hammering it once every thirty seconds, and the post-recovery thundering herd took down my single Postgres instance for another twenty minutes. Exponential with jitter solved both problems with the four-line function above. Jitter costs nothing and saves you the day a thousand phones come back online at the same airport gate.

Step 6: Make every handler idempotent

A queue that promises at-least-once delivery is a queue that will, sooner or later, deliver something twice. Build for that on day one.

Every operation already has a client-generated UUID. The handler passes that UUID to the server in an Idempotency-Key header. The server stores a row keyed by the UUID and the user, and the second call returns the first response from a small cache.

struct SendEmailHandler: OperationHandler {
    let api: APIClient
    func run(_ payload: Data) async throws {
        let req = try JSONDecoder().decode(SendEmailPayload.self, from: payload)
        try await api.post(
            "/v1/send",
            body: req,
            headers: ["Idempotency-Key": req.id.uuidString]
        )
    }
}

I keep the UUID inside the payload as well, not just on the envelope. That way the handler can be tested without an envelope, and the wire format is self-contained.

Failure I hit. I forgot to make my analytics endpoint idempotent and reused the same Outbox for it. After a server hiccup, I had a user whose "opened settings" event was counted four times. Funnels read like the app had become viral overnight. Lesson: idempotency is not a server-only concern, but the server has to enforce it. The client only proposes; the server disposes.

Step 7: Hand the UI an optimistic state

The UI does not wait for the network. It writes to a local store and shows the user the "sent" state immediately. The Outbox is a background fact.

@MainActor
final class ComposeViewModel: ObservableObject {
    @Published private(set) var sentLocally: [LocalEmail] = []
    let outbox: Outbox

    func send(_ draft: Draft) async {
        let local = LocalEmail(id: UUID(), draft: draft, sentAt: .now)
        sentLocally.append(local)
        let payload = try! JSONEncoder().encode(
            SendEmailPayload(id: local.id, to: draft.to, body: draft.body)
        )
        try? await outbox.enqueue(kind: "sendEmail", payload: payload)
    }
}

I do not surface a "queued" indicator. The user pressed send. Their mental model is "it sent." Showing "queued, will retry" in the UI is a tax I refuse to charge the user.

Failure I hit. My first version did show a spinner per queued message. Users hated it. The spinner was honest and useless: "queued, retrying" is information the user can do nothing with. Removing the spinner did not change a single delivery outcome and improved the perceived speed of the app noticeably. The Outbox should be invisible until it fails for so long that a user-visible warning is warranted, which in my app is an hour and which I have triggered exactly once in ten months.

Common failures sidebar

A short list of things I have seen go wrong with variants of this design, gathered from my own commits and from two friends who shipped their own versions:

Filesystem case sensitivity. Naming files with the raw UUID().uuidString is fine on iOS but bites you the moment you copy your queue directory onto a macOS volume formatted case-insensitively for testing. Lowercase the filename if you ever read these files outside the app sandbox.
Codable evolution. Adding a new field to OutboxOperation without Optional will break decode for every envelope written by a previous app version. New fields are always optional, with a default.
Background time. The drain loop after a network-up event has roughly thirty seconds of background time before iOS suspends the app. Long uploads need URLSessionConfiguration.background, which is a different story I am leaving out here.
Clock drift. nextEligibleAt is wall-clock. A user setting their phone's clock forward six hours will trigger an immediate drain of every queued op. In practice this has never happened to me. In paranoid mode I would use ContinuousClock for the comparison.
Disk full. try data.write(.atomic) throws on a full disk. Handle the throw; do not silently lose the user's input. I show a one-time alert and keep the in-memory copy.

What I would do differently if I started today

I have now shipped this design across two apps. A few small changes I would make on a clean slate:

First, I would use SwiftData for the store from day one. When I wrote the original, SwiftData was not stable enough for me to trust on the critical path. It is now, and it gives you a real query language for diagnostics, which the file-per-op approach does not.

Second, I would expose a read-only AsyncSequence<OutboxState> from the Outbox so the UI could subscribe to overall queue health without polling. Today I poll from a hidden settings screen, which works, but a SwiftUI integration would be cleaner.

Last, I would write a fuzz test that randomly enqueues, drains, kills the app mid-drain, and replays the queue. Most of the bugs I shipped in this code would have been caught by one weekend of fuzzing.

If you want to take this further

Three things are worth your next afternoon, in order:

Add a hidden debug screen with a "retry now" button and a list of currently queued operations. You will use it more than you think when triaging real user reports.
Wire os_log with a category of "outbox" on every state transition. The signpost output in Instruments is shockingly informative once a queue starts misbehaving in the field.
Read the actor reentrancy section of the Swift Concurrency proposal one more time. The Outbox is the place in my codebase where I most often regret not having read it more carefully.

Captio-style Simple Memo is an iOS app I maintain on weekends. It turns whatever I type into an email and sends it before I can second-guess. I write here when a piece of code surprises me. App Store.

DEV Community