DEV Community: Madhur Banger

Architecting an Uber-scale real-time tracking & dispatch system

Madhur Banger — Sun, 07 Dec 2025 22:17:09 +0000

Executive summary (what you’ll learn)

You’ll get:

A clear set of functional and non-functional requirements for a ride-hailing tracking/dispatch system.
The core entities and a suggested schema for them.
A complete high-level architecture with component responsibilities.
A story that walks through every event from “Request” → “Match” → “Accept” → “Live tracking”.
Detailed deep dives on: location ingestion flow, spatial indexing & proximity search, map matching, locking/consistency for offers, streaming pipelines for features, push delivery reliability, ETAs, disaster recovery and operational concerns.
Two machine-readable diagrams (sequence + data-flow) you can paste into tooling that supports Mermaid.
Five challenging follow-up questions and full answers derived from the design.

I take a streaming-first approach: treat the location stream as the canonical log (Kafka), process near-real-time features with Flink, use an in-memory hot store for sub-second queries (Redis/cluster), and orchestrate offers with short reservations + durable workflow when needed. This is the pattern Uber itself uses in public writeups. (Uber)

0 — Constraints and assumptions (scope)

This doc focuses on the core real-time tracking and dispatch workflow (matching, ETA, live location). Out-of-scope: payments, full driver onboarding flows, rating UI, full GDPR legal text, and per-country regulatory minutiae. Where implementation choices vary (e.g., exactly how many H3 rings to search), I describe tradeoffs rather than prescriptively choose a single number.

1 — Requirements

Functional requirements (must-have)

Rider can request a ride by providing pickup & destination; system returns an estimated fare and ETA.
Rider can confirm a ride; system must match them to a nearby available driver.
System delivers the offer to candidate drivers and receives accept/decline decisions.
Rider and driver receive continuous, low-latency updates about trip state and driver location (map + ETA).
System persists full trip events for billing, audit, ML and dispute resolution.

Non-functional requirements (system properties)

Low end-to-end latency: driver GPS → rider UI updates within a few seconds typical, matching decision within target < 1 minute.
High throughput: millions of location updates per minute; bursty peaks near events/cities.
Consistency for offers: a driver should not receive two simultaneous conflicting offers; a ride should not be double-assigned.
Durability & replayability: event stream must be persisted to enable replays/backfills for features and debugging.
Cost vs freshness tradeoffs: prioritize low latency for current location and low cost long-term retention for raw traces.
Resilience & recoverability: failover to backup region, auto-expire reservations to avoid resource locks, safe recovery from outages.

2 — Core entities (conceptual model)

Below are the core entities you will persist/serve. Each can map to a microservice table/document depending on your platform.

DriverState
- driverId, status (OFFLINE/AVAILABLE/EN_ROUTE/ON_TRIP), vehicleId, lastSeenTimestamp, currentH3Cell, currentRoadSegmentId
LocationEvent (immutable stream record)
- eventId, driverId, timestamp, rawLat, rawLng, speed, bearing, accuracy, seqNo, clientTs
MapMatchedPoint (derived)
- driverId, timestamp, roadSegmentId, matchedLat, matchedLng, confidenceScore
Ride
- rideId, riderId, pickupLatLng, destLatLng, requestedProduct, fareEstimate, state (REQUESTED/OFFERED/ACCEPTED/ONGOING/COMPLETED/CANCELLED), assignedDriverId, createdAt, updatedAt
OfferReservation
- driverId, rideId, reservationState, reservedAt, expiresAt (TTL-backed)
H3CellAggregate
- h3CellId, timestampWindow, supplyCount, demandCount, smoothedSupply, smoothedDemand, computedFeatures[]
PushMessageMeta
- clientId, seqNo, TTL, priority, lastAckedSeq

You will append LocationEvent records to a streaming system (e.g., Kafka) and maintain hot DriverState and H3 mappings in an in-memory store.

3 — High level architecture (component list and responsibilities)

Key technology roles (example mapping):

Kafka: canonical event log for durability & replay. Use partitions by geography / cell for locality. (Uber)
Redis (or clusterd in-memory store): hot current locations, geo indices (GEOADD / GEOSEARCH), ephemeral locks/reservations (SETNX + TTL), and per-driver connection state.
Flink / streaming jobs: compute per-H3-cell features, smoothing (k-ring) and multi-window aggregates for pricing and ETAs. Uber built large Flink pipelines for near-real-time features. (Uber)
Map matching service: fast, low-latency HMM map matcher for live updates + offline reprocessing for accuracy (CatchME is Uber’s map-matching/accuracy work). (Uber)
Push delivery (RAMEN): persistent streaming connection infrastructure for low-latency delivery with sequencing, TTL and retries — designed to replace heavy polling. Uber’s RAMEN and its later gRPC migration are core references. (Uber)

4 — End-to-end event narrative (step-by-step story)

Below I walk the system through the chronological events that happen in a typical request cycle. Think of this as the runtime story of the system.

Scene 0 — Background activity: drivers sending location

Continuous background flow: every driver that is online runs a background loop in their Driver App:

The OS location stack (Android fused provider / iOS Core Location) emits a sample: lat/lng/accuracy/speed/bearing. The app attaches driverId (via JWT), currentTripId (if any), a monotonic seqNo, and packages the payload as a compact protobuf.
The app applies adaptive sampling: when the driver is ON_TRIP or moving quickly, samples are frequent (sub-second to few-second cadence). When idle, cadence drops to save battery/data. This keeps traffic reasonable while preserving required fidelity. Uber engineering emphasizes push efficiency to reduce polling and battery use. (Uber)
The payload is sent to the API Gateway over TLS (or over the persistent gRPC streaming connection if available). If the network is flaky the driver queues and retries; delivery semantics are at-least-once with sequence numbers to handle replays and out-of-order events.

Ingress: the gateway validates the token & payload and appends the LocationEvent to Kafka’s location topic (partitioned by geography/cell). Kafka gives durability and replayability; downstream consumers can reprocess the stream to rebuild state or recompute features later. (Uber)

Hot view update: a hot-index updater service consumes the location event, calls the map-matching service to get a map-snapped point (or fast inline heuristic), converts it to an H3 cell, and writes:

DriverState (driverId → current location, status, lastSeen)
H3 cell membership: add driverId to cell’s live list (with TTL).

This hot view is what the dispatcher queries for near-real-time matching.

Why both Kafka and a hot store? Kafka persists the full raw stream for analytics; the hot store serves low-latency neighbor queries. This split gives both durability and speed. (Uber)

Scene 1 — Rider taps Request

Rider client builds the request: pickup lat/lng (or drop pin), destination, product option. Critical fields (fare, eta) are not trusted from clients — server recomputes them. The rider UI opens/maintains a persistent streaming channel (RAMEN/gRPC) to receive assignment and live driver updates. (Uber)
The gateway authenticates and writes the ride request to the canonical log (Kafka). The Ride Service consumes the request event and creates a persistent Ride object: rideId, state=REQUESTED, pickup/destination, createdAt. Persisting early ensures crash recovery and audit trail.
The Ride Service triggers the matching workflow (either via a queue or directly invoking the matching fleet). Matching is partitioned by geography: convert pickup point → H3 cell (chosen resolution), then compute a k-ring (neighbor cells) to form the initial candidate set. Using H3 reduces the candidate set dramatically versus a global scan. (Uber)

Scene 2 — Candidate selection and ranking

The matching pipeline does the following:

Query the hot index for live drivers inside the k-ring cells. Each candidate has metadata: lastSeen, status, estimated time to reposition, recent acceptance probability (ML score), vehicle attributes, and current ETA to pickup (estimated via a fast routing heuristic).
Score & rank candidates by a multi-objective function: minimize rider wait time, minimize driver repositioning cost (fuel/idle), maximize acceptance probability, respect driver preferences and fairness constraints. This is the core of a dispatch optimizer (DISCO). Selecting the “best” candidate is not just nearest-first — acceptance probability and marketplace balance matter. (System Design Newsletter)

Scene 3 — Reserving a driver (safety & consistency)

Before sending an offer to Driver A, you must ensure another matching instance doesn’t simultaneously offer the same driver.

Reservation pattern (fast, pragmatic):

Run SETNX reservation:{driverId} => rideId in Redis with TTL = acceptance window (e.g., 10s). SETNX is atomic; success means this instance reserved the driver. This prevents other matchers from using the same driver while the TTL is active.
If SETNX fails, skip this driver (someone else reserved them).

Why TTL? If the matching instance crashes or the driver device never responds, the TTL auto-expires, preventing forever-held reservations. For stronger guarantees use a durable workflow (next section). The ephemeral lock + TTL pattern is widely used for short windows where speed is essential.

Scene 4 — Offer delivery (RAMEN + push semantics)

Dispatch tells the push decision system (Fireball → RAMEN) to send an offer to Driver A. The offer message includes rideId, pickup coords, estimated ETA to pickup, estimated payout, and a sequence number and TTL. Uber’s RAMEN platform maintains persistent streams to clients, supports sequencing, TTL and priorities, and moved from SSE to gRPC streaming in later iterations for improved performance and acknowledgements. (Uber)
RAMEN delivers the message over the driver’s open stream (or via APN/FCM fallbacks). The message is given a short TTL and priority (offers are high priority). Delivery attempts continue until the message is acknowledged or TTL expires.
Driver app shows accept/decline UI. If the driver is offline, RAMEN will retry according to TTL/retry policy; if still unreachable, TTL expires and the reservation lock will auto-expire, allowing the dispatcher to try the next candidate.

Scene 5 — Driver accepts; atomic assignment

Driver taps Accept and client sends an acceptance event.

Server side acceptance flow (atomically):

The acceptance is appended to Kafka (durable event).
Ride Service (or orchestration workflow) verifies the reservation: check reservation:{driverId} equals this rideId (or confirm lock still present). If yes: set Ride.assignedDriverId = driverId, Ride.state = ACCEPTED. Persist to DB.
Release reservation (delete key) and commit acceptance.
Notify rider (via RAMEN) with driver details and ETA, and notify other subsystems (billing, trip telemetry). The event is visible in the canonical log for downstream consumers. This sequence ensures only one driver becomes assigned. Durable logs + simple atomics on reservations + idempotent updates handle races and retries robustly.

Scene 6 — Live tracking & ETA updates while driver approaches

Ongoing live loop:

Driver app continues to push frequent location events as the driver moves to pickup. Each event flows through the same ingestion pipeline: gateway → Kafka → hot index updater → map matcher → push triggers.
Map-matching snaps points to roads (reducing jitter on maps and improving routing accuracy). Uber’s CatchME and other map projects describe HMM-style map-matching and map quality detection; production systems often have a fast online matcher and a heavier offline reprocessing pipeline for accuracy. (Uber)

ETA recomputation:

The routing engine uses the driver’s current map-matched position + live traffic (from aggregated per-road segment features) to recompute ETA to pickup. Streaming pipelines (Flink) maintain per-cell and per-segment recent traversal times and other features (smoothing across neighbors and multiple time windows) that the routing/ETA model consumes. Uber’s large Flink pipelines produce a forest of features used by pricing, dispatch and ETA. (Uber)

Push policy:

Not every location event is pushed to the rider; the push decision uses heuristics and priority levels to avoid flooding the client (e.g., push on significant position change, on ET A change beyond a threshold, or on state transition). RAMEN’s sequencing and TTL ensure the rider sees the latest meaningful state and can recover missed updates on reconnect. (Uber)

Scene 7 — Trip start, progress, and completion

When driver picks up the rider, driver app sends an ON_TRIP event; Ride Service transitions state=ONGOING. Events are appended to Kafka for audit/analytics.
During trip, the driver continues to send location data; the map-matched trace is persisted for billing/ML and streamed into analytics. Offline reprocessing improves trip trace quality and feeds ETA models.
Upon drop-off, state=COMPLETED, final fare is computed (with surge/adjustments), billing triggered, and trip record stored. All events remain in the canonical log for future replay. This durability is essential for disputes, analytics and model training. (Uber)

5 — Deep dives (technical texture & tradeoffs)

Below are detailed explanations of the most important technical challenges and the engineering patterns that address them.

Deep dive A — Ingesting millions of location events per second

Problem: tens of millions of drivers emitting frequent updates — naive writes to a primary DB will not scale.

Pattern: streaming-first ingestion with hot cache + durable event log.

Step 1: client → API Gateway → append to Kafka location topic (partition by geographic shard). Kafka is durable, partitioned, and allows consumer groups to scale processing. Uber uses a Kappa-style approach and Kafka as the central log. (Uber)
Step 2: multiple consumers read Kafka:
- a hot index updater (low-latency) writes to Redis cluster / in-memory store for fast neighbor queries, with TTL semantics;
- a map-matching service consumes a parallel stream to create map-matched points and writes to persistent storage;
- streaming analytics (Flink) consumes to compute aggregates/ML features. (Uber)

Tradeoffs: Kafka adds a small delivery latency (ms–100s ms) but enables replay and decouples producers/consumers. Hot store gives sub-second reads but is ephemeral — combine both.

Deep dive B — Efficient proximity search with H3 (hexagons)

Problem: find near drivers without scanning all drivers.

Pattern: quantize space into hierarchical cells (H3); search k-rings.

Convert pickup lat/lng → H3 cell (chosen resolution). H3 provides geoToH3 and kRing functions to enumerate neighbor cells efficiently. Using hexagons means neighbor distances are uniform and smoothing is easier than with squares. H3 is Uber’s open source spatial index used exactly for this purpose. (Uber)

Workflow:

CandidateCells = kRing(pickupCell, k)
For each cell in CandidateCells, read live driver list in hot store (these are driverIds with lastSeen and confidence).
Merge lists, filter by status/vehicle, compute routing ETA to pickup (approx), and rank.

Tradeoffs: H3 cell resolution selection is critical: coarse cells reduce lookup count but increase candidate set; fine cells reduce candidates but increase boundary cases requiring extra k-rings. Also need to handle search across cell boundaries (scatter-gather small set).

Deep dive C — Map matching & good-quality traces

Problem: raw GPS is noisy (urban canyon, multipath) and unsuitable for ETA or visual UX.

Pattern: two-tier map matching (fast online + offline reprocess).

Fast online matcher: low-latency HMM or deterministic snapping to nearby road segments using a short sliding window stored in Redis. Used for immediate decisions and push.
Offline high-accuracy reprocessing: consume raw location stream and run heavier HMM/graph algorithms to create audit-grade map-matched traces and update road statistics. Uber’s CatchME and mapping projects describe how map matching is done and how map data quality is maintained. (Uber)

Tradeoffs: online matcher must be cheap and fast (some noise tolerated), offline reprocessing fixes the noise for analytics and ML.

Deep dive D — Preventing double offers & implementing reservations

Problem: ensure a driver only gets one outstanding offer and a ride is not double-assigned.

Patterns:

Fast reservation (ephemeral locks): SETNX reservation:{driverId} => rideId with TTL. Cheap, atomic, good for acceptance windows (e.g., 10s). TTL guards against stuck reservations.
Durable workflow: implement the offer lifecycle in a workflow engine (Temporal / Step Functions / custom) with persisted timers and deterministic retries. Use the workflow as the single source of truth for the offer state; ephemeral locks are used for instantaneous coordination. Uber uses durable orchestration concepts and their internal workflow platforms for robust business logic. (Uber)

Edge cases & mitigation:

If Redis cluster partitions or fails, a fallback check must exist: confirm assignment with DB/canonical log before finalizing. Use idempotent updates to ride state and commit to Kafka for audit.

Deep dive E — Streaming features for ETA & pricing (Flink + smoothing)

Problem: ETA and surge require per-cell supply/demand and smoothed temporal features.

Pattern: streaming jobs compute per-H3 cell counts and apply k-ring smoothing across neighbors and multiple window sizes. Uber runs Flink jobs to compute multi-window features and smooth across neighbors; these features feed ETA models and marketplace pricing. (Uber)

Implementation notes:

Input: LocationEvent + RideRequestEvent topics.
Operations: assign event to H3 cell; maintain counts per cell per sliding window; apply k-ring smoothing (broadcast counts to neighbor cells); combine multiple window sizes (1,2,4,8… minutes).
Output: per-cell feature tables served to online decision services (Pinot / real-time store).

Tradeoffs: Flink state size can be large — shard and partition carefully; use state TTLs and compaction.

Deep dive F — RAMEN: delivery & reconnect semantics

Problem: avoid gateway overload from polling and deliver reliable updates to millions of clients.

Pattern: server-driven persistent streaming (RAMEN), sequence numbers, TTLs, priority queues.

RAMEN maintains persistent client sessions and streams messages with monotonic seqNo. Clients reconnect with last acked seqNo to resume. RAMEN supports TTL per message (drop after TTL), priority buckets and retries. Uber public posts discuss RAMEN’s design and migration to gRPC streams for better acknowledgements. (Uber)

Why sequence numbers? ensure at-least-once delivery and allow the client to request missing ranges upon reconnect; server can trim messages once acked beyond some watermark.

Deep dive G — Disaster recovery & client-assisted reconciliation

Problem: data center failure while trips are active.

Pattern: multi-region replication of canonical logs + client state digest:

Maintain Kafka replication or mirror; standby region replays canonical log to rebuild ephemeral state.
For extreme cases, driver devices hold a compact encrypted state digest of active trip state (recent event watermark, assigned ride id, last seen seq). A recovering region can query devices for state to reconcile active trips. Uber has described driver-phone assisted recovery patterns in public posts. (System Design Newsletter)

Tradeoffs: some recovery paths involve more latency for users but prevent complete data loss.

7 — Developer checklist — implementation priorities

Canonical event log first: produce LocationEvent and RideEvent to Kafka before any derived writes. Enables replay. (Uber)
Hot index: implement a geo index with TTL per driver; ensure cheap reads by partitioning by H3 cell.
Reservation primitive: atomic SETNX with TTL for offers; test race scenarios.
Push platform: build or adopt a streaming push layer with sequence numbers and reconnect semantics. RAMEN blog is a useful reference. (Uber)
Streaming pipelines: implement Flink jobs for per-cell aggregates and supply/demand smoothing. (Uber)
Map matching: integrate a fast online matcher and run offline jobs to reprocess traces for accuracy (CatchME-inspired). (Uber)

8 — Two diagrams (Mermaid syntax you can paste into mermaid.live)

1) Sequence diagram: Request → Match → Accept → Live updates

2) Data-flow diagram: ingestion → streams → hot store → push/analytics

9 — Operational & safety concerns

Monitoring & SLOs: ingest latency, Kafka consumer lag, Redis latency, push delivery RTT, percent of offers failing due to lock contention, end-to-end user observed latency.
Chaos testing: simulate Redis partitions, Kafka outages, RAMEN backpressure. Verify offer TTL behavior and replay recovery.
Cost containment: tune sampling rate, hot store TTLs, and feature aggregation windows to control memory and compute cost.
Privacy & security: always authenticate tokens at gateway, never trust client time/fare fields, and enforce per-user data retention policies.

10 — Five hard follow-up questions (and answers)

Q1 — How do you choose H3 resolution and k for kRing to find drivers?

Answer: choose a resolution such that a single H3 cell roughly represents the expected driver density vs search radius tradeoff. In dense urban areas, prefer higher resolution (smaller cells) to reduce candidate set. Start with a target cell size ~100–200m for pickup proximity in cities: compute expected number of drivers per cell empirically and tune k to cover the desired radius (kRing expands combinatorially). The decision must consider average driver density, desired max candidate count per query, and boundary cases that require scatter-gather across adjacent shards. Instrument and adapt resolution by city — Uber’s public H3 docs are the best resource for understanding the indexing tradeoffs. (Uber)

Q2 — How do you guarantee a driver is not double-assigned when the reservation TTL expires at the same moment another instance tries to assign?

Answer: use a small, strictly ordered set of checks and durable writes:

SETNX reservation:driverId with robust TTL; if success, proceed.
Persist the assignment to durable storage in a transaction or via an idempotent write pattern that checks if assignedDriverId is still null (compare-and-set).
Append the acceptance event to the canonical log (Kafka) immediately after confirm.
If race occurs, the compare-and-set on the persistent Ride record resolves it; other instance detects non-null assignedDriverId and rolls back. For highest reliability, keep the assignment workflow in a durable workflow engine (Temporal) so timers and state survive restarts. This layered approach (ephemeral lock + durable CAS + canonical event) is pragmatic and used in production systems. (Uber)

Q3 — How do you keep end-to-end latency low when computing ETAs requires heavy ML features?

Answer: compute heavy features in streaming jobs offline/near-real-time and prepopulate a low-latency feature store (Pinot / real-time DB). The online ETA model reads a compact feature vector from the store rather than computing heavy features synchronously. Use approximate, fast routing for immediate UI estimates and refresh the ETA when the richer model updates. This architecture — precompute features in Flink, serve from a real-time DB — is how large platforms keep inference latency small while using complex features. (Uber)

Q4 — What happens when Kafka is overloaded or a critical consumer group lags?

Answer: design for graceful degradation: (a) hot index updates should use direct backup pollers for last-known state (Redis TTL) to maintain matching, (b) slow analytics consumers can lag without breaking the matching flow since matching relies on hot store and not raw offline features, and (c) apply backpressure to producers or buffer at gateway if Kafka is saturated. Maintain operational dashboards for consumer lag and autoscale consumer groups. Consider cross-cluster replication and sharded Kafka topics by geography to limit blast radius. (Uber)

Q5 — How would you detect and mitigate location spoofing or fraudulent trips?

Answer: combine mobile sensor fusion (speed/accel patterns), trip trace anomaly detection (e.g., improbable speed jumps, teleportation), cross-validation with map matching (CatchME identifies map anomalies), and behavioral models (sudden surge in acceptance/creation patterns). Flag suspicious traces for manual review and automated throttling. Enforce device attestation where possible and monitor for abnormal payment/refund patterns. Uber has published fraud detection practices and mapping quality checks; incorporate those signals into a real-time fraud detector pipeline. (Uber)

11 — Practical next steps (for a team implementing this)

Build the minimal vertical slice: driver location ingestion → Kafka → hot index (Redis) → simple matching → push offer to driver via a simple websocket. Validate correctness and race conditions.
Add ephemeral reservations (SETNX + TTL) and idempotent ride assignment. Run chaos tests to check TTL expiry and takeover scenarios.
Add map matching and verify UX smoothing.
Add Flink streaming jobs for a small set of features and serve them to online ETA models.
Replace websocket with a production push (RAMEN-like) layer with sequence numbers and reconnect logic.
Iterate on tuning H3 resolution and kRing parameters by city.

12 — Primary sources & recommended reading (selected)

H3: Hexagonal hierarchical geospatial indexing system (Uber blog + GitHub). (Uber)
RAMEN: Uber’s Real-Time Push Platform (and gRPC migration article). (Uber)
Building Scalable Streaming Pipelines (Flink at Uber). (Uber)
CatchME and map matching at Uber. (Uber)
Kappa / Kafka architecture at Uber (streaming as canonical log). (Uber)

How to Design a Notification System: A Complete Guide

Madhur Banger — Sat, 06 Dec 2025 16:36:25 +0000

This guide outlines how to build a scalable notification service supporting email, SMS, push and in-app channels. It covers user preferences, rate-limiting, synchronous & batch delivery, queueing with retries, high availability, and trade-offs between latency, cost and reliability.
design a notification system

Think about the apps you use every day. A banking app alerts you about suspicious activity. A shopping app lets you know when your order ships. A chat app pings you when a friend sends a message. All of these rely on a notification system working seamlessly behind the scenes.

On the surface, notifications feel simple—you receive a message or alert, and that’s it. But under the hood, they’re surprisingly complex. Delivering millions of notifications across email, SMS, push, and in-app channels requires careful planning, robust infrastructure, and a design that can scale.

That’s why learning how to design a notification system is so important. It’s not just a valuable System Design interview question—it’s a real-world problem faced by companies building apps at scale. Understanding the design decisions involved will make you a stronger engineer and prepare you to tackle one of the most common challenges in distributed systems.

In this guide, you’ll walk through the full journey: defining requirements, exploring challenges, outlining the architecture, and thinking about scaling, reliability, and security. By the end, you’ll know not just how to design a notification system, but how to explain the trade-offs behind your decisions in both interviews and real projects.

Defining the Problem: What Does a Notification System Do?

Before diving into architecture for a System Design interview, it’s important to step back and define what we’re trying to build. At its core, a notification system is responsible for delivering timely information to users through multiple channels.

Channels a Notification System Supports

Push notifications: Mobile and desktop alerts via services like FCM (Firebase Cloud Messaging) or APNs (Apple Push Notification Service). ([Firebase][1])
Email notifications: Transactional emails like password resets, receipts, or promotions. ([SendGrid][2])
SMS notifications: Time-sensitive alerts like OTPs or delivery updates. ([Twilio][3])
In-app notifications: Alerts that appear inside the app itself, often using real-time connections like WebSockets.

The Role of Notifications

User engagement: Encouraging users to return to your app.
Transaction updates: Confirming actions like payments, orders, or deliveries.
Security alerts: Warning users about logins, password changes, or suspicious activity.
System communication: Keeping users informed about downtime, maintenance, or feature changes.

When you’re asked to design a notification system, it’s not just about sending messages—it’s about building a service that handles scale, personalization, and reliability across all these channels.

Requirements for Designing a Notification System

Functional Requirements

Multi-channel support: Push, SMS, email, and in-app alerts.
Guaranteed delivery: Ensure messages are sent reliably.
User preferences: Respect quiet hours, preferred channels, and opt-outs.
Personalization: Customize notifications to user context (e.g., “Hi John, your package is on the way”).
Retry mechanism: Resend messages if a delivery attempt fails.

Non-Functional Requirements

Scalability: Handle millions of notifications per minute during peak times.
Low latency: Deliver time-sensitive notifications (like OTPs) in seconds.
High availability: Keep the system running even during failures.
Fault tolerance: Recover from service crashes or network issues without data loss.
Observability: Track notification delivery, failures, and retries with monitoring and logs.

When designing a notification system in an interview, start by clarifying these requirements. This demonstrates structured thinking, sets the stage for your architectural decisions, and is good System Design interview practice.

Core Challenges in Notification Systems

Key Challenges

High Concurrency: Millions of notifications may need to be delivered in a very short time.
Multi-Channel Complexity: Each channel has its own quirks and failure modes.
Delivery Guarantees: Deciding between at-most-once, at-least-once, or exactly-once semantics.
User Preferences: Enforcing opt-in/out, quiet hours, per-channel preferences at scale.
Failure Handling: External dependencies fail — need retries, backoffs, dead-lettering, and fallbacks.

These challenges shape the architecture. A successful design of a notification system solution isn’t just about sending messages—it’s about building resilience, respecting preferences, and scaling gracefully.

High-Level Architecture of a Notification System

At a high level, a notification system looks like a pipeline: an event is generated, processed, and delivered through the right channel.

Core Components

Producer (Event Source) — Generates notification events.
Queue or Message Broker — Acts as a buffer between producers and notification workers (Kafka, RabbitMQ, SQS are common choices). Kafka is especially favored for high-throughput streaming scenarios. ([Apache Kafka][4])
Notification Service — Reads events, applies business logic, checks user preferences, selects channel, formats payload.
Channel Integrations — Interfaces to APNs/FCM for push, SMTP or SendGrid for email, Twilio or telecom gateways for SMS. ([Twilio][3])
Databases — Store user preferences, delivery logs, rate-limits, and notification history.
Monitoring & Logging — Metrics, dashboards, tracing.

Flow Overview

Event created (purchase, message, system alert).
Event queued to a broker for reliability.
Notification workers process it, check preferences, choose channel.
Message delivered via external providers.
Delivery status logged; failed events routed for retry or DLQ.

Diagram ():

[Producers] --> [Ingress API] --> [Message Broker (Kafka/SQS)] --> [Worker Pool]
      |                                                          |
      v                                                          v
[User Pref DB / Cache]                                       [Channel adapters]
                                                              /   |   \
                                                           APNs  SMS  Email
                                                              \   |   /
                                                           [Delivery logs + DLQ]

Event Sources and Producers

Types of Event Sources

User Actions: Message sent, order placed.
System Events: Payment completed, account locked.
Scheduled Jobs: Reminders, digests.
External Integrations: Carrier status updates, shipment feeds.

Event Prioritization

High Priority: OTPs, security alerts.
Medium Priority: Transaction updates.
Low Priority: Marketing, recommendations.

Event Payload Design (recommended JSON schema)

{
  "event_id": "uuid-v4",
  "event_type": "ORDER_SHIPPED",
  "priority": "MEDIUM",
  "user_id": "user-123",
  "tenant_id": "org-456",
  "timestamp": "2025-12-06T12:34:56Z",
  "payload": {
    "order_id": "order-789",
    "tracking_url": "https://carrier/track/..."
  },
  "channels": ["PUSH", "EMAIL"],        // optional override
  "idempotency_key": "user-123-order-789"
}

Message Queues and Brokers

Queues decouple producers and consumers, provide buffering, and enable backpressure.

Broker Choices and When to Use Them

Kafka: High-throughput, retention, replayability, partitioning — ideal for streaming and extremely high-volume notification pipelines. ([Apache Kafka][4])
RabbitMQ: Flexible routing patterns and acknowledgement semantics; good for complex routing and smaller scale.
AWS SQS / Google Pub/Sub: Fully managed, simpler operational model — use when you want less ops overhead. Comparison references show Kafka is chosen for heavy throughput and replay; SQS for simpler durable queues. ([DataCamp][5])

Design Patterns

Topic per logical stream: notifications.events, notifications.audit, notifications.deadletter.
Partitioning key: Use user_id % partitions to distribute load and keep per-user ordering when needed.
DLQ (Dead Letter Queue): For events that repeatedly fail after retries.

Notification Delivery Mechanisms

Push Notifications

Use FCM for Android and cross-platform convenience; APNs for direct iOS delivery (FCM often proxies to APNs for iOS). See FCM docs. ([Firebase][1])
Device token handling: store tokens, handle invalidation, rotate stale tokens.
Payload size limits exist; keep messages small.

Email

Use a transactional provider (SendGrid, SES) for deliverability and ISP reputation management. Authenticate domains with SPF/DKIM/DMARC. ([SendGrid][2])

SMS

Use gateway providers (Twilio, Nexmo) and follow local regulations; consider long-code vs short-code for deliverability and throughput. ([Twilio][3])

In-app

Use WebSockets or SSE for real-time in-app messaging; persist notifications to enable history and unread counts.

Channel Selection Logic

Respect user preferences, priority, channel availability, and cost constraints. E.g., prefer push for engagement, SMS for critical security messages if push unavailable.

User Preferences and Personalization

Storage & Access

Database: store canonical preferences (Postgres / DynamoDB).
Cache: Keep current preferences in Redis for low-latency reads.
Schema (SQL-like):

CREATE TABLE user_notification_preferences (
  user_id UUID PRIMARY KEY,
  email_enabled BOOLEAN DEFAULT TRUE,
  sms_enabled BOOLEAN DEFAULT FALSE,
  push_enabled BOOLEAN DEFAULT TRUE,
  quiet_hours JSONB, -- example: {"start":"22:00","end":"07:00","tz":"Asia/Kolkata"}
  updated_at TIMESTAMP
);

Personalization Techniques

Templates with variables, user locale/timezone, AB testing for copy/CTA.
Use server-side rendering for transactional content (receipts) and lightweight templates for push/SMS.

Regulatory Compliance

Enforce opt-out and consent records (audit trail), support right-to-be-forgotten requests, and ensure CAN-SPAM/TCPA/GDPR compliance where applicable. ([SendGrid Support][6])

Scaling the Notification System

Horizontal scaling

Make workers stateless. Use autoscaling groups or k8s HPA for workers.
Use consumer groups for Kafka to parallelize consumption.

Partitioning and Sharding

User-based sharding ensures per-user ordering (if required).
Channel-based separation isolates channel-specific bottlenecks (e.g., push pool separate from SMS pool).
Region-based deployment: run clusters near user populations to reduce latency.

Caching

Cache user preferences and device tokens in Redis to reduce DB load.

Elasticity

Pre-warm connections to third-party providers when expecting spikes (e.g., holiday campaigns).
Implement circuit breakers and graceful degradation: drop low-priority notifications under high load.

Ensuring Reliability and Delivery Guarantees

Delivery Semantics

At-Least-Once with idempotency keys is a practical approach: retry until ack while ensuring dedupe on final delivery.
Idempotency keys: use event_id or idempotency_key along with user_id to dedupe.

Retry Strategy

Exponential backoff with jitter. Example algorithm:

function backoff(attempt) {
  const base = 1000; // 1s
  const max = 60 * 1000; // 1 minute
  const jitter = Math.random() * 0.5 + 0.75;
  return Math.min(base * Math.pow(2, attempt), max) * jitter;
}

Dead-Letter Queue (DLQ)

After N retries, move to DLQ for manual inspection or offline reprocessing.

Fallback Channels

If push delivery fails for a critical alert, escalate to SMS/email as fallback (subject to user preferences and cost policy).

Idempotency Implementation (sample)

Store delivered_events(user_id, idempotency_key) with TTL (e.g., 7 days).
When a worker processes an event:

Check delivered_events. If exists, mark as duplicate and ack.
Otherwise, attempt send; on success insert delivered_events and ack.

Example Redis flow (pseudo):

SETNX delivered:{user_id}:{idempotency_key} 1
EXPIRE delivered:{user_id}:{idempotency_key} 604800

Code Examples

Example: Node.js Kafka Consumer that Processes Notifications (simplified)

// Requires: kafkajs
const { Kafka } = require('kafkajs');
const axios = require('axios'); // to call channel providers

const kafka = new Kafka({ clientId: 'notif-service', brokers: ['kafka:9092'] });
const consumer = kafka.consumer({ groupId: 'notif-workers' });

async function sendPush(deviceToken, payload) {
  // call FCM / APNs adapter; adapter handles auth, token refresh
  return axios.post('https://push-adapter/push', { deviceToken, payload });
}

async function processMessage(message) {
  const event = JSON.parse(message.value.toString());
  const dedupeKey = `delivered:${event.user_id}:${event.idempotency_key}`;
  const wasSet = await redis.setnx(dedupeKey, '1');
  if (!wasSet) {
    // already processed
    return;
  }
  await redis.expire(dedupeKey, 7 * 24 * 60 * 60); // 7 days

  // check user preferences from cache or DB
  const prefs = await getUserPrefs(event.user_id);
  if (!shouldSend(prefs, event)) return;

  // choose channel
  if (prefs.push_enabled) {
    await sendPush(prefs.device_token, buildPushPayload(event));
  } else if (prefs.sms_enabled) {
    await sendSMS(prefs.phone, buildSmsText(event));
  } else if (prefs.email_enabled) {
    await sendEmail(prefs.email, buildEmailHtml(event));
  }
}

(async () => {
  await consumer.connect();
  await consumer.subscribe({ topic: 'notifications.events', fromBeginning: false });
  await consumer.run({
    eachMessage: async ({ topic, partition, message }) => {
      try {
        await processMessage(message);
      } catch (err) {
        // push to DLQ after logging
        await pushToDLQ(message);
      }
    }
  });
})();

Retry & Backoff Example (Node.js)

async function withRetries(fn, maxAttempts = 5) {
  for (let attempt = 0; attempt < maxAttempts; attempt++) {
    try {
      return await fn();
    } catch (err) {
      if (attempt === maxAttempts - 1) throw err;
      const waitMs = backoff(attempt);
      await sleep(waitMs);
    }
  }
}

Observability: Monitoring and Logging

Key Metrics

Throughput (notifications/sec) per channel.
End-to-end latency (event creation → delivery).
Failure rate and retries.
Queue lag and DLQ size.
Provider-specific metrics (e.g., Twilio delivery status).

Logging & Tracing

Structured logs (JSON) with event_id, user_id, channel, status, and latency.
Distributed tracing to link producer → queue → worker → provider.
Real-time dashboards and alerts (PagerDuty) for spikes in failure rate or queue depth.

Example Alerts

Queue backlog exceeds threshold.
SMS provider failure rate > 5% (example threshold). ([Twilio][3])

Testing, Reliability Engineering & Chaos

Testing Strategies

Unit tests for formatting, preference checks, rate-limits.
Integration tests with sandboxed providers or mocks.
Load testing to validate throughput and auto-scaling.
Canary releases for worker changes.

Chaos Engineering

Simulate provider outages (e.g., blocking APNs or Twilio) and ensure fallbacks work.
Test DLQ replay behavior and idempotency guarantees.

Security, Privacy & Compliance

Secure credentials (use vaults/secret manager).
Encrypt PII in transit and at rest.
Log consent changes and retention windows for GDPR.
Rate-limit SMS/email to prevent abuse and limit costs.
Implement role-based access control for operations dashboards.

Cost vs Latency vs Reliability: Trade-offs

Concern	Low-Latency Priority	Low-Cost Priority	High-Reliability Priority
Queue choice	Kafka (low-latency partitioning)	SQS (managed cost)	Kafka or SQS with replication
Delivery	Push & SMS	Email/push (cheaper)	Multi-channel fallback
Provisioning	Pre-warmed connections	On-demand scaling	Over-provision / high-availability multi-region
Retries	Short backoff, aggressive	Fewer retries to reduce cost	More retries, DLQ for manual handling

Explain choices in interviews: e.g., choose Kafka for high throughput and replay, but explain operational cost and complexity; choose SQS if ops-free and throughput fits. ([Apache Kafka][4])

Operational Runbook (short checklist)

Monitor queue lag and scale consumers when backlog > X minutes.
If provider error rate spikes, switch traffic to fallback or degrade promotional notifications.
Rotate device tokens daily cleanup for dead tokens.
Monitor cost-per-message for SMS; apply campaign throttles.
Security incident: revoke provider keys and fail closed for critical notifications.

Appendix: Implementation Patterns & Advanced Topics

Ordering Guarantees

Per-user ordering: use partition key as user_id to ensure events for a user are consumed in created order.

Multi-tenancy

Tenant-aware routing: include tenant_id in event metadata and configure per-tenant provider settings (e.g., specific email domain or SMS sender).

Bulk vs Real-time

Real-time: OTPs, fraud alerts — push immediately.
Batch: Daily digests or marketing — aggregate and send as batch jobs to reduce cost and rate-limit provider usage.

Provider Pooling & Connection Management

Maintain pools of HTTP/HTTP2 connections to push providers to reduce cold-start latency. Pre-warm connections before a campaign.

References & Further Reading (authoritative sources)

Kafka Use Cases and Architecture. ([Apache Kafka][4])
FCM & APNs push docs. ([Firebase][1])
Kafka vs SQS comparison & guidance. ([DataCamp][5])
Twilio SMS deliverability and best practices. ([Twilio][3])
SendGrid / Email deliverability practices. ([SendGrid][2])

Closing: How to Use This in Interviews

Start by clarifying requirements and constraints (SLA, scale, budget).
Present a high-level pipeline and justify each major choice (Kafka vs SQS, provider selection).
Discuss edge-cases: delivery guarantees, idempotency, rate-limits.
Show code snippets and data models for critical parts (user prefs, idempotency) — include cost/latency trade-offs.
Finish by describing operations: monitoring, alerts, and incident playbooks.