Madhur Banger

Posted on Dec 7, 2025

Architecting an Uber-scale real-time tracking & dispatch system

#systemdesign #distributedsystems #webdev

Executive summary (what you’ll learn)

You’ll get:

A clear set of functional and non-functional requirements for a ride-hailing tracking/dispatch system.
The core entities and a suggested schema for them.
A complete high-level architecture with component responsibilities.
A story that walks through every event from “Request” → “Match” → “Accept” → “Live tracking”.
Detailed deep dives on: location ingestion flow, spatial indexing & proximity search, map matching, locking/consistency for offers, streaming pipelines for features, push delivery reliability, ETAs, disaster recovery and operational concerns.
Two machine-readable diagrams (sequence + data-flow) you can paste into tooling that supports Mermaid.
Five challenging follow-up questions and full answers derived from the design.

I take a streaming-first approach: treat the location stream as the canonical log (Kafka), process near-real-time features with Flink, use an in-memory hot store for sub-second queries (Redis/cluster), and orchestrate offers with short reservations + durable workflow when needed. This is the pattern Uber itself uses in public writeups. (Uber)

0 — Constraints and assumptions (scope)

This doc focuses on the core real-time tracking and dispatch workflow (matching, ETA, live location). Out-of-scope: payments, full driver onboarding flows, rating UI, full GDPR legal text, and per-country regulatory minutiae. Where implementation choices vary (e.g., exactly how many H3 rings to search), I describe tradeoffs rather than prescriptively choose a single number.

1 — Requirements

Functional requirements (must-have)

Rider can request a ride by providing pickup & destination; system returns an estimated fare and ETA.
Rider can confirm a ride; system must match them to a nearby available driver.
System delivers the offer to candidate drivers and receives accept/decline decisions.
Rider and driver receive continuous, low-latency updates about trip state and driver location (map + ETA).
System persists full trip events for billing, audit, ML and dispute resolution.

Non-functional requirements (system properties)

Low end-to-end latency: driver GPS → rider UI updates within a few seconds typical, matching decision within target < 1 minute.
High throughput: millions of location updates per minute; bursty peaks near events/cities.
Consistency for offers: a driver should not receive two simultaneous conflicting offers; a ride should not be double-assigned.
Durability & replayability: event stream must be persisted to enable replays/backfills for features and debugging.
Cost vs freshness tradeoffs: prioritize low latency for current location and low cost long-term retention for raw traces.
Resilience & recoverability: failover to backup region, auto-expire reservations to avoid resource locks, safe recovery from outages.

2 — Core entities (conceptual model)

Below are the core entities you will persist/serve. Each can map to a microservice table/document depending on your platform.

DriverState
- driverId, status (OFFLINE/AVAILABLE/EN_ROUTE/ON_TRIP), vehicleId, lastSeenTimestamp, currentH3Cell, currentRoadSegmentId
LocationEvent (immutable stream record)
- eventId, driverId, timestamp, rawLat, rawLng, speed, bearing, accuracy, seqNo, clientTs
MapMatchedPoint (derived)
- driverId, timestamp, roadSegmentId, matchedLat, matchedLng, confidenceScore
Ride
- rideId, riderId, pickupLatLng, destLatLng, requestedProduct, fareEstimate, state (REQUESTED/OFFERED/ACCEPTED/ONGOING/COMPLETED/CANCELLED), assignedDriverId, createdAt, updatedAt
OfferReservation
- driverId, rideId, reservationState, reservedAt, expiresAt (TTL-backed)
H3CellAggregate
- h3CellId, timestampWindow, supplyCount, demandCount, smoothedSupply, smoothedDemand, computedFeatures[]
PushMessageMeta
- clientId, seqNo, TTL, priority, lastAckedSeq

You will append LocationEvent records to a streaming system (e.g., Kafka) and maintain hot DriverState and H3 mappings in an in-memory store.

3 — High level architecture (component list and responsibilities)

Key technology roles (example mapping):

Kafka: canonical event log for durability & replay. Use partitions by geography / cell for locality. (Uber)
Redis (or clusterd in-memory store): hot current locations, geo indices (GEOADD / GEOSEARCH), ephemeral locks/reservations (SETNX + TTL), and per-driver connection state.
Flink / streaming jobs: compute per-H3-cell features, smoothing (k-ring) and multi-window aggregates for pricing and ETAs. Uber built large Flink pipelines for near-real-time features. (Uber)
Map matching service: fast, low-latency HMM map matcher for live updates + offline reprocessing for accuracy (CatchME is Uber’s map-matching/accuracy work). (Uber)
Push delivery (RAMEN): persistent streaming connection infrastructure for low-latency delivery with sequencing, TTL and retries — designed to replace heavy polling. Uber’s RAMEN and its later gRPC migration are core references. (Uber)

4 — End-to-end event narrative (step-by-step story)

Below I walk the system through the chronological events that happen in a typical request cycle. Think of this as the runtime story of the system.

Scene 0 — Background activity: drivers sending location

Continuous background flow: every driver that is online runs a background loop in their Driver App:

The OS location stack (Android fused provider / iOS Core Location) emits a sample: lat/lng/accuracy/speed/bearing. The app attaches driverId (via JWT), currentTripId (if any), a monotonic seqNo, and packages the payload as a compact protobuf.
The app applies adaptive sampling: when the driver is ON_TRIP or moving quickly, samples are frequent (sub-second to few-second cadence). When idle, cadence drops to save battery/data. This keeps traffic reasonable while preserving required fidelity. Uber engineering emphasizes push efficiency to reduce polling and battery use. (Uber)
The payload is sent to the API Gateway over TLS (or over the persistent gRPC streaming connection if available). If the network is flaky the driver queues and retries; delivery semantics are at-least-once with sequence numbers to handle replays and out-of-order events.

Ingress: the gateway validates the token & payload and appends the LocationEvent to Kafka’s location topic (partitioned by geography/cell). Kafka gives durability and replayability; downstream consumers can reprocess the stream to rebuild state or recompute features later. (Uber)

Hot view update: a hot-index updater service consumes the location event, calls the map-matching service to get a map-snapped point (or fast inline heuristic), converts it to an H3 cell, and writes:

DriverState (driverId → current location, status, lastSeen)
H3 cell membership: add driverId to cell’s live list (with TTL).

This hot view is what the dispatcher queries for near-real-time matching.

Why both Kafka and a hot store? Kafka persists the full raw stream for analytics; the hot store serves low-latency neighbor queries. This split gives both durability and speed. (Uber)

Scene 1 — Rider taps Request

Rider client builds the request: pickup lat/lng (or drop pin), destination, product option. Critical fields (fare, eta) are not trusted from clients — server recomputes them. The rider UI opens/maintains a persistent streaming channel (RAMEN/gRPC) to receive assignment and live driver updates. (Uber)
The gateway authenticates and writes the ride request to the canonical log (Kafka). The Ride Service consumes the request event and creates a persistent Ride object: rideId, state=REQUESTED, pickup/destination, createdAt. Persisting early ensures crash recovery and audit trail.
The Ride Service triggers the matching workflow (either via a queue or directly invoking the matching fleet). Matching is partitioned by geography: convert pickup point → H3 cell (chosen resolution), then compute a k-ring (neighbor cells) to form the initial candidate set. Using H3 reduces the candidate set dramatically versus a global scan. (Uber)

Scene 2 — Candidate selection and ranking

The matching pipeline does the following:

Query the hot index for live drivers inside the k-ring cells. Each candidate has metadata: lastSeen, status, estimated time to reposition, recent acceptance probability (ML score), vehicle attributes, and current ETA to pickup (estimated via a fast routing heuristic).
Score & rank candidates by a multi-objective function: minimize rider wait time, minimize driver repositioning cost (fuel/idle), maximize acceptance probability, respect driver preferences and fairness constraints. This is the core of a dispatch optimizer (DISCO). Selecting the “best” candidate is not just nearest-first — acceptance probability and marketplace balance matter. (System Design Newsletter)

Scene 3 — Reserving a driver (safety & consistency)

Before sending an offer to Driver A, you must ensure another matching instance doesn’t simultaneously offer the same driver.

Reservation pattern (fast, pragmatic):

Run SETNX reservation:{driverId} => rideId in Redis with TTL = acceptance window (e.g., 10s). SETNX is atomic; success means this instance reserved the driver. This prevents other matchers from using the same driver while the TTL is active.
If SETNX fails, skip this driver (someone else reserved them).

Why TTL? If the matching instance crashes or the driver device never responds, the TTL auto-expires, preventing forever-held reservations. For stronger guarantees use a durable workflow (next section). The ephemeral lock + TTL pattern is widely used for short windows where speed is essential.

Scene 4 — Offer delivery (RAMEN + push semantics)

Dispatch tells the push decision system (Fireball → RAMEN) to send an offer to Driver A. The offer message includes rideId, pickup coords, estimated ETA to pickup, estimated payout, and a sequence number and TTL. Uber’s RAMEN platform maintains persistent streams to clients, supports sequencing, TTL and priorities, and moved from SSE to gRPC streaming in later iterations for improved performance and acknowledgements. (Uber)
RAMEN delivers the message over the driver’s open stream (or via APN/FCM fallbacks). The message is given a short TTL and priority (offers are high priority). Delivery attempts continue until the message is acknowledged or TTL expires.
Driver app shows accept/decline UI. If the driver is offline, RAMEN will retry according to TTL/retry policy; if still unreachable, TTL expires and the reservation lock will auto-expire, allowing the dispatcher to try the next candidate.

Scene 5 — Driver accepts; atomic assignment

Driver taps Accept and client sends an acceptance event.

Server side acceptance flow (atomically):

The acceptance is appended to Kafka (durable event).
Ride Service (or orchestration workflow) verifies the reservation: check reservation:{driverId} equals this rideId (or confirm lock still present). If yes: set Ride.assignedDriverId = driverId, Ride.state = ACCEPTED. Persist to DB.
Release reservation (delete key) and commit acceptance.
Notify rider (via RAMEN) with driver details and ETA, and notify other subsystems (billing, trip telemetry). The event is visible in the canonical log for downstream consumers. This sequence ensures only one driver becomes assigned. Durable logs + simple atomics on reservations + idempotent updates handle races and retries robustly.

Scene 6 — Live tracking & ETA updates while driver approaches

Ongoing live loop:

Driver app continues to push frequent location events as the driver moves to pickup. Each event flows through the same ingestion pipeline: gateway → Kafka → hot index updater → map matcher → push triggers.
Map-matching snaps points to roads (reducing jitter on maps and improving routing accuracy). Uber’s CatchME and other map projects describe HMM-style map-matching and map quality detection; production systems often have a fast online matcher and a heavier offline reprocessing pipeline for accuracy. (Uber)

ETA recomputation:

The routing engine uses the driver’s current map-matched position + live traffic (from aggregated per-road segment features) to recompute ETA to pickup. Streaming pipelines (Flink) maintain per-cell and per-segment recent traversal times and other features (smoothing across neighbors and multiple time windows) that the routing/ETA model consumes. Uber’s large Flink pipelines produce a forest of features used by pricing, dispatch and ETA. (Uber)

Push policy:

Not every location event is pushed to the rider; the push decision uses heuristics and priority levels to avoid flooding the client (e.g., push on significant position change, on ET A change beyond a threshold, or on state transition). RAMEN’s sequencing and TTL ensure the rider sees the latest meaningful state and can recover missed updates on reconnect. (Uber)

Scene 7 — Trip start, progress, and completion

When driver picks up the rider, driver app sends an ON_TRIP event; Ride Service transitions state=ONGOING. Events are appended to Kafka for audit/analytics.
During trip, the driver continues to send location data; the map-matched trace is persisted for billing/ML and streamed into analytics. Offline reprocessing improves trip trace quality and feeds ETA models.
Upon drop-off, state=COMPLETED, final fare is computed (with surge/adjustments), billing triggered, and trip record stored. All events remain in the canonical log for future replay. This durability is essential for disputes, analytics and model training. (Uber)

5 — Deep dives (technical texture & tradeoffs)

Below are detailed explanations of the most important technical challenges and the engineering patterns that address them.

Deep dive A — Ingesting millions of location events per second

Problem: tens of millions of drivers emitting frequent updates — naive writes to a primary DB will not scale.

Pattern: streaming-first ingestion with hot cache + durable event log.

Step 1: client → API Gateway → append to Kafka location topic (partition by geographic shard). Kafka is durable, partitioned, and allows consumer groups to scale processing. Uber uses a Kappa-style approach and Kafka as the central log. (Uber)
Step 2: multiple consumers read Kafka:
- a hot index updater (low-latency) writes to Redis cluster / in-memory store for fast neighbor queries, with TTL semantics;
- a map-matching service consumes a parallel stream to create map-matched points and writes to persistent storage;
- streaming analytics (Flink) consumes to compute aggregates/ML features. (Uber)

Tradeoffs: Kafka adds a small delivery latency (ms–100s ms) but enables replay and decouples producers/consumers. Hot store gives sub-second reads but is ephemeral — combine both.

Deep dive B — Efficient proximity search with H3 (hexagons)

Problem: find near drivers without scanning all drivers.

Pattern: quantize space into hierarchical cells (H3); search k-rings.

Convert pickup lat/lng → H3 cell (chosen resolution). H3 provides geoToH3 and kRing functions to enumerate neighbor cells efficiently. Using hexagons means neighbor distances are uniform and smoothing is easier than with squares. H3 is Uber’s open source spatial index used exactly for this purpose. (Uber)

Workflow:

CandidateCells = kRing(pickupCell, k)
For each cell in CandidateCells, read live driver list in hot store (these are driverIds with lastSeen and confidence).
Merge lists, filter by status/vehicle, compute routing ETA to pickup (approx), and rank.

Tradeoffs: H3 cell resolution selection is critical: coarse cells reduce lookup count but increase candidate set; fine cells reduce candidates but increase boundary cases requiring extra k-rings. Also need to handle search across cell boundaries (scatter-gather small set).

Deep dive C — Map matching & good-quality traces

Problem: raw GPS is noisy (urban canyon, multipath) and unsuitable for ETA or visual UX.

Pattern: two-tier map matching (fast online + offline reprocess).

Fast online matcher: low-latency HMM or deterministic snapping to nearby road segments using a short sliding window stored in Redis. Used for immediate decisions and push.
Offline high-accuracy reprocessing: consume raw location stream and run heavier HMM/graph algorithms to create audit-grade map-matched traces and update road statistics. Uber’s CatchME and mapping projects describe how map matching is done and how map data quality is maintained. (Uber)

Tradeoffs: online matcher must be cheap and fast (some noise tolerated), offline reprocessing fixes the noise for analytics and ML.

Deep dive D — Preventing double offers & implementing reservations

Problem: ensure a driver only gets one outstanding offer and a ride is not double-assigned.

Patterns:

Fast reservation (ephemeral locks): SETNX reservation:{driverId} => rideId with TTL. Cheap, atomic, good for acceptance windows (e.g., 10s). TTL guards against stuck reservations.
Durable workflow: implement the offer lifecycle in a workflow engine (Temporal / Step Functions / custom) with persisted timers and deterministic retries. Use the workflow as the single source of truth for the offer state; ephemeral locks are used for instantaneous coordination. Uber uses durable orchestration concepts and their internal workflow platforms for robust business logic. (Uber)

Edge cases & mitigation:

If Redis cluster partitions or fails, a fallback check must exist: confirm assignment with DB/canonical log before finalizing. Use idempotent updates to ride state and commit to Kafka for audit.

Deep dive E — Streaming features for ETA & pricing (Flink + smoothing)

Problem: ETA and surge require per-cell supply/demand and smoothed temporal features.

Pattern: streaming jobs compute per-H3 cell counts and apply k-ring smoothing across neighbors and multiple window sizes. Uber runs Flink jobs to compute multi-window features and smooth across neighbors; these features feed ETA models and marketplace pricing. (Uber)

Implementation notes:

Input: LocationEvent + RideRequestEvent topics.
Operations: assign event to H3 cell; maintain counts per cell per sliding window; apply k-ring smoothing (broadcast counts to neighbor cells); combine multiple window sizes (1,2,4,8… minutes).
Output: per-cell feature tables served to online decision services (Pinot / real-time store).

Tradeoffs: Flink state size can be large — shard and partition carefully; use state TTLs and compaction.

Deep dive F — RAMEN: delivery & reconnect semantics

Problem: avoid gateway overload from polling and deliver reliable updates to millions of clients.

Pattern: server-driven persistent streaming (RAMEN), sequence numbers, TTLs, priority queues.

RAMEN maintains persistent client sessions and streams messages with monotonic seqNo. Clients reconnect with last acked seqNo to resume. RAMEN supports TTL per message (drop after TTL), priority buckets and retries. Uber public posts discuss RAMEN’s design and migration to gRPC streams for better acknowledgements. (Uber)

Why sequence numbers? ensure at-least-once delivery and allow the client to request missing ranges upon reconnect; server can trim messages once acked beyond some watermark.

Deep dive G — Disaster recovery & client-assisted reconciliation

Problem: data center failure while trips are active.

Pattern: multi-region replication of canonical logs + client state digest:

Maintain Kafka replication or mirror; standby region replays canonical log to rebuild ephemeral state.
For extreme cases, driver devices hold a compact encrypted state digest of active trip state (recent event watermark, assigned ride id, last seen seq). A recovering region can query devices for state to reconcile active trips. Uber has described driver-phone assisted recovery patterns in public posts. (System Design Newsletter)

Tradeoffs: some recovery paths involve more latency for users but prevent complete data loss.

7 — Developer checklist — implementation priorities

Canonical event log first: produce LocationEvent and RideEvent to Kafka before any derived writes. Enables replay. (Uber)
Hot index: implement a geo index with TTL per driver; ensure cheap reads by partitioning by H3 cell.
Reservation primitive: atomic SETNX with TTL for offers; test race scenarios.
Push platform: build or adopt a streaming push layer with sequence numbers and reconnect semantics. RAMEN blog is a useful reference. (Uber)
Streaming pipelines: implement Flink jobs for per-cell aggregates and supply/demand smoothing. (Uber)
Map matching: integrate a fast online matcher and run offline jobs to reprocess traces for accuracy (CatchME-inspired). (Uber)

8 — Two diagrams (Mermaid syntax you can paste into mermaid.live)

1) Sequence diagram: Request → Match → Accept → Live updates

2) Data-flow diagram: ingestion → streams → hot store → push/analytics

9 — Operational & safety concerns

Monitoring & SLOs: ingest latency, Kafka consumer lag, Redis latency, push delivery RTT, percent of offers failing due to lock contention, end-to-end user observed latency.
Chaos testing: simulate Redis partitions, Kafka outages, RAMEN backpressure. Verify offer TTL behavior and replay recovery.
Cost containment: tune sampling rate, hot store TTLs, and feature aggregation windows to control memory and compute cost.
Privacy & security: always authenticate tokens at gateway, never trust client time/fare fields, and enforce per-user data retention policies.

10 — Five hard follow-up questions (and answers)

Q1 — How do you choose H3 resolution and k for kRing to find drivers?

Answer: choose a resolution such that a single H3 cell roughly represents the expected driver density vs search radius tradeoff. In dense urban areas, prefer higher resolution (smaller cells) to reduce candidate set. Start with a target cell size ~100–200m for pickup proximity in cities: compute expected number of drivers per cell empirically and tune k to cover the desired radius (kRing expands combinatorially). The decision must consider average driver density, desired max candidate count per query, and boundary cases that require scatter-gather across adjacent shards. Instrument and adapt resolution by city — Uber’s public H3 docs are the best resource for understanding the indexing tradeoffs. (Uber)

Q2 — How do you guarantee a driver is not double-assigned when the reservation TTL expires at the same moment another instance tries to assign?

Answer: use a small, strictly ordered set of checks and durable writes:

SETNX reservation:driverId with robust TTL; if success, proceed.
Persist the assignment to durable storage in a transaction or via an idempotent write pattern that checks if assignedDriverId is still null (compare-and-set).
Append the acceptance event to the canonical log (Kafka) immediately after confirm.
If race occurs, the compare-and-set on the persistent Ride record resolves it; other instance detects non-null assignedDriverId and rolls back. For highest reliability, keep the assignment workflow in a durable workflow engine (Temporal) so timers and state survive restarts. This layered approach (ephemeral lock + durable CAS + canonical event) is pragmatic and used in production systems. (Uber)

Q3 — How do you keep end-to-end latency low when computing ETAs requires heavy ML features?

Answer: compute heavy features in streaming jobs offline/near-real-time and prepopulate a low-latency feature store (Pinot / real-time DB). The online ETA model reads a compact feature vector from the store rather than computing heavy features synchronously. Use approximate, fast routing for immediate UI estimates and refresh the ETA when the richer model updates. This architecture — precompute features in Flink, serve from a real-time DB — is how large platforms keep inference latency small while using complex features. (Uber)

Q4 — What happens when Kafka is overloaded or a critical consumer group lags?

Answer: design for graceful degradation: (a) hot index updates should use direct backup pollers for last-known state (Redis TTL) to maintain matching, (b) slow analytics consumers can lag without breaking the matching flow since matching relies on hot store and not raw offline features, and (c) apply backpressure to producers or buffer at gateway if Kafka is saturated. Maintain operational dashboards for consumer lag and autoscale consumer groups. Consider cross-cluster replication and sharded Kafka topics by geography to limit blast radius. (Uber)

Q5 — How would you detect and mitigate location spoofing or fraudulent trips?

Answer: combine mobile sensor fusion (speed/accel patterns), trip trace anomaly detection (e.g., improbable speed jumps, teleportation), cross-validation with map matching (CatchME identifies map anomalies), and behavioral models (sudden surge in acceptance/creation patterns). Flag suspicious traces for manual review and automated throttling. Enforce device attestation where possible and monitor for abnormal payment/refund patterns. Uber has published fraud detection practices and mapping quality checks; incorporate those signals into a real-time fraud detector pipeline. (Uber)

11 — Practical next steps (for a team implementing this)

Build the minimal vertical slice: driver location ingestion → Kafka → hot index (Redis) → simple matching → push offer to driver via a simple websocket. Validate correctness and race conditions.
Add ephemeral reservations (SETNX + TTL) and idempotent ride assignment. Run chaos tests to check TTL expiry and takeover scenarios.
Add map matching and verify UX smoothing.
Add Flink streaming jobs for a small set of features and serve them to online ETA models.
Replace websocket with a production push (RAMEN-like) layer with sequence numbers and reconnect logic.
Iterate on tuning H3 resolution and kRing parameters by city.

12 — Primary sources & recommended reading (selected)

H3: Hexagonal hierarchical geospatial indexing system (Uber blog + GitHub). (Uber)
RAMEN: Uber’s Real-Time Push Platform (and gRPC migration article). (Uber)
Building Scalable Streaming Pipelines (Flink at Uber). (Uber)
CatchME and map matching at Uber. (Uber)
Kappa / Kafka architecture at Uber (streaming as canonical log). (Uber)

DEV Community