DEV Community

Cover image for Building an Offline-First Retail Hub in Rust: How ApexEdge Keeps Stores Selling When the Internet Dies
iCe Gaming
iCe Gaming

Posted on

Building an Offline-First Retail Hub in Rust: How ApexEdge Keeps Stores Selling When the Internet Dies

If you’ve ever built retail software, you know the happy-path demo is the easy part.

The hard part is everything around it:

  • internet goes down,
  • HQ APIs lag or fail,
  • terminals still need to sell,
  • receipts still need to print,
  • and nothing can be lost.

I built ApexEdge, a RUST-powered store hub orchestrator that sits between POS/mPOS clients and HQ systems:

POS/MPOS <-> ApexEdge <-> HQ

Repo: https://github.com/AncientiCe/apex-edge
This post is about the actual engineering problems I had to solve, and the architecture patterns that made the system reliable in production-like conditions.


The Core Constraint: Stores Must Keep Selling

Retail can’t block on cloud availability. That drove my first principle:

The store hub is the source of operational truth during a transaction.

In practice, that means:

  • local persistence for catalog/prices/promos/customers/config,
  • local cart + checkout orchestration,
  • local document generation (receipt, merchant copy, kitchen chit, etc.),
  • async sync with HQ, not inline dependency.

If HQ is unavailable, checkout should still complete locally.

Synchronization is eventually consistent, but sales flow is immediate.


Why a Hub Instead of POS Calling HQ Directly?

Direct POS -> HQ can work for tiny setups, but at scale it creates fragile coupling:

  • every terminal becomes an integration client,
  • token/session handling is duplicated per app/device,
  • every command depends on WAN quality,
  • retries/idempotency become inconsistent across clients.

The hub model centralizes this:

  • one northbound contract for POS commands,
  • one southbound contract for HQ submission/sync,
  • one place to enforce idempotency, retries, conflict handling, and observability.

Contract-Driven Commands Instead of “Random Endpoints”

Instead of exposing many ad-hoc mutable endpoints, I route checkout behavior through a command envelope:

POST /pos/command

Examples:

  • create_cart
  • add_line_item
  • set_customer
  • set_tendering
  • add_payment
  • finalize_order

This gave me major wins:

  1. Compatibility discipline: versioned command envelope.
  2. Idempotency at the boundary: safe retries from POS.
  3. Unified observability: command metrics by operation/outcome.
  4. Deterministic testing: journey tests mirror real checkout flows.

Problem #1: “Exactly Once” Is a Lie (So I Designed for At-Least-Once)

Networks duplicate requests. Clients retry. Humans double-click.

So I assume at-least-once delivery and make handlers idempotent:

  • command envelope includes idempotency_key,
  • server stores command results keyed by scope,
  • repeated command with same key returns same logical response.

This prevents duplicate line items, duplicate payments, and duplicate order submission side effects.


Problem #2: Reliable HQ Submission Without Blocking Checkout

Inline finalize -> call HQ is brittle. If HQ times out, do you fail the sale?

I don’t.

I use a durable outbox:

  1. finalize order locally,
  2. atomically write HQ submission payload into outbox table,
  3. background dispatcher posts to HQ with retry/backoff,
  4. mark accepted/retry/dead-letter based on result.

This separates customer-facing latency from external dependency reliability.

Operationally, this is huge:

  • cashiers are not blocked by HQ instability,
  • submissions are durable across process restarts,
  • dead-letter rows are inspectable and replayable.

Problem #3: Keeping Catalog/Pricing Fresh Without Breaking Ongoing Sales

HQ sync is asynchronous NDJSON ingest with checkpoints per entity:

  • catalog
  • categories
  • price book
  • tax rules
  • promotions
  • customers
  • coupons
  • inventory
  • print templates

Design goals:

  • ingest in batches,
  • move checkpoints only on successful entity application,
  • tolerate unknown entities for forward compatibility,
  • surface sync state via API for UI visibility.

This avoids all-or-nothing fragility and makes partial progress explicit.


Problem #4: Inventory Truth vs. Checkout Experience

Stock rules are trickier than “if qty > 0”.

I enforce availability at add-to-cart time:

  • inactive items are blocked,
  • out-of-stock items are blocked,
  • quantity checks can return insufficient stock errors.

But I also handle incomplete sync states pragmatically:

  • if inventory is not yet synced for an item, I allow add-to-cart (configurable policy at architecture level).

That tradeoff favors operational continuity while still applying strict checks where data exists.


Problem #5: Document Generation Should Be Local and Deterministic

Receipts are not optional, and they can’t depend on round-tripping to HQ.

I generate documents in the hub:

  • render from synced templates + order/cart view models,
  • persist document artifacts,
  • expose retrieval endpoints for POS clients.

POS remains responsible for printer/device UX, but generation is centralized and reproducible.

Bonus: template updates can be distributed through sync instead of app redeploys.


Problem #6: Security for Shared, Real-World Devices

mPOS fleets need practical trust bootstrap:

  • generate pairing codes,
  • pair device once,
  • exchange external identity token + device proof,
  • receive local hub access/refresh tokens,
  • protect operational routes behind hub auth.

This model gives controlled device onboarding without hardcoding secrets into POS binaries.


Problem #7: You Can’t Operate What You Can’t See

I instrumented behavior ownership across routes and background flows:

  • command counts + latencies + outcomes,
  • outbox dispatch attempts/duration/DLQ,
  • sync ingest batches + durations + outcomes,
  • DB operation outcomes,
  • HTTP-level request metrics.

In an outage, these metrics answer:

  • is checkout still flowing?
  • are HQ submissions backlogged?
  • is sync stuck on a specific entity?
  • are failures concentrated at DB, network, or domain validation?

Without this, “it feels slow” becomes your only signal.


Testing Strategy That Actually Catches Regressions

I leaned hard on behavior-level tests:

  • in-process smoke tests for health/ready/basic command flows,
  • full journey tests from cart creation to finalized order + document generation + outbox payload assertions,
  • crate-level tests for domain, storage, sync, outbox, printing, and contracts.

This matters because distributed correctness is mostly integration correctness.


Architecture Pattern Summary

What worked for me:

  1. Local-first transaction boundary
  2. Command envelope + idempotency
  3. Durable outbox for external submission
  4. Checkpointed async ingest for HQ sync
  5. Local document generation
  6. Device trust + token exchange
  7. First-class metrics on all critical paths

If you’re building store, warehouse, or edge-heavy systems, this combination gives resilience without requiring “perfect network” fantasies.


Tradeoffs (No Silver Bullets)

What you pay for this architecture:

  • more moving parts than direct API calls,
  • stronger schema/contract discipline required,
  • eventual consistency complexity,
  • operational tooling for DLQ replay and sync introspection.

Still worth it for domains where downtime equals lost revenue.


Practical Advice If You’re Starting Today

  • Start with idempotency keys on day one.
  • Model outbox as a product feature, not a reliability patch.
  • Keep sync entity-specific with independent checkpoints.
  • Treat metrics as acceptance criteria.
  • Run end-to-end journey tests before adding more endpoints.
  • Document behavior ownership explicitly (who owns which route/flow/metric).

Closing

The biggest shift for me was this:

I stopped optimizing for request/response elegance and started optimizing for business continuity under failure.

That changes everything from API shape to data model to test strategy.

If you’re designing offline-capable systems and want to compare patterns, I’m happy to share a deeper follow-up on:

  • idempotency schema design,
  • outbox retry and DLQ policy,
  • sync conflict handling,
  • or observability dashboards that worked in practice.

Top comments (0)