DEV Community

Cover image for Kafka - Uber-Style Trip Event Pipeline Example
ZeeshanAli-0704
ZeeshanAli-0704

Posted on

Kafka - Uber-Style Trip Event Pipeline Example

πŸ“š Table of Contents

  1. Purpose
  2. Event Lifecycle
  3. Event Streams (Topics)
    • events.raw (ingestion layer)
    • events.cleaned (standardized stream)
    • trip.events (business-critical stream)
    • driver.location (high-frequency updates)
    • dispatch.commands
    • billing.events
    • analytics.events (optional fan-out)
  4. Processing Patterns
  5. Fault Tolerance & Reliability
  6. Trade Offs
  7. Summary Flow Example

Uber Style Trip Event Pipeline (Detailed)

🎯 Purpose

  • Handle millions of trip lifecycle events per second.
  • Support real-time matching, billing, surge pricing, ETA calculations, fraud detection, and analytics.
  • Guarantee low latency, reliability, and correct billing even with failures.

Event Lifecycle

Imagine a rider books a trip β†’ driver accepts β†’ trip starts β†’ trip ends β†’ payment β†’ receipt.
Each step generates events, and Kafka is the backbone that transports and processes them.


Event Streams (Topics)

1. events.raw (ingestion layer)

  • What it is:

    • All raw events (user requests, driver pings, trip updates, payment events, logs) land here first.
    • Similar to a staging area.
  • Why:

    • Raw events may be incomplete, malformed, duplicated, or carry noise (e.g., multiple GPS pings per second).
    • By first writing to events.raw, you don’t lose anything.
    • Acts as a data lake inside Kafka β€” good for replay, debugging, reprocessing.
  • Partitioning:

    • Usually partition by eventType (trip, location, billing).
    • Keeps ingestion scalable across multiple consumers.

2. events.cleaned (standardized stream)

  • What it is:

    • Events from events.raw get validated, deduplicated, schema-validated, enriched β†’ written here.
    • Example:
    • GPS events consolidated to 1 ping per X seconds.
    • Trip request events checked for valid rider/driver IDs.
    • Payment events tagged with transaction IDs.
  • Why:

    • Ensures downstream systems get consistent, structured data.
    • Clean separation of dirty data handling vs business logic.
  • Partitioning:

    • By entity ID (e.g., tripId, driverId) β†’ preserves ordering for that entity.

3. trip.events (business-critical stream)

  • What it is:

    • Cleaned trip lifecycle events only (request β†’ accept β†’ start β†’ end β†’ cancel).
    • Every trip’s state transitions must be ordered correctly.
  • Why:

    • Dispatch engine uses this for matching riders ↔ drivers.
    • Billing service uses this to compute fare.
    • Analytics uses this to understand demand/supply.
  • Partitioning:

    • By tripId β†’ all trip updates land in same partition, preserving ordering.

4. driver.location (high-frequency updates)

  • What it is:

    • Continuous driver GPS updates (every few seconds).
  • Why:

    • Needed for:
    • Nearest driver search.
    • ETA calculation.
    • Surge pricing (supply-demand ratio).
  • Partitioning:

    • By driverId (to keep updates for one driver ordered).
    • Or by geoHash region (group drivers by location cells β†’ helps load balance).
  • Additional Handling:

    • Raw driver pings β†’ cleaned β†’ then aggregated (e.g., one location every 5 sec instead of every 1 sec).

5. dispatch.commands

  • What it is:

    • Commands from dispatch/matching system β†’ drivers.
    • Example: β€œDriver 123, go pick up Rider 456”.
  • Why:

    • Ensures low-latency, reliable delivery of matching decisions.
    • Acts as the control plane of the system.
  • Partitioning:

    • By driverId β†’ ensures commands to a driver are in correct order.

6. billing.events

  • What it is:

    • Financially critical events β†’ trip started, trip ended, payment initiated, payment confirmed.
  • Why:

    • Exactly-once processing required here (no double charges).
    • Must be atomic: offset commit + DB update + event publishing.
  • Partitioning:

    • By tripId or paymentId.
    • Keeps billing updates for one trip ordered.

7. analytics.events (optional fan-out)

  • What it is:

    • Non-critical stream of enriched events β†’ BI dashboards, ML pipelines.
    • Example: demand trends, heat maps, revenue reports.
  • Why:

    • Offloads heavy analytical processing from core dispatch/billing topics.
    • Can tolerate slight delays.

Processing Patterns

  1. Validation & Cleaning
  • events.raw β†’ events.cleaned.
  • Deduplication, schema checks, enrichment.
  1. Matching & Dispatch
  • events.cleaned + driver.location β†’ dispatch system.
  • Kafka Streams KTable to store active drivers.
  • Rider requests matched to nearest available driver.
  1. Trip Lifecycle Management
  • trip.events consumed by trip state machine.
  • Ensures transitions are valid (no β€œtrip.end” without β€œtrip.start”).
  1. Billing & Payments
  • trip.events + billing.events.
  • Kafka transactions ensure exactly-once writes to DB + Kafka.
  1. Analytics & ML
  • analytics.events used for demand prediction, surge pricing, fraud detection.

Fault Tolerance & Reliability

  • Replication Factor = 3 for critical topics (trip.events, billing.events).
  • Idempotent Producers = avoid duplicates.
  • Kafka Transactions = atomic billing writes.
  • Multi-region replication = MirrorMaker 2 for cross-region failover.

Trade Offs

  • Low latency vs Cost β†’ more partitions & brokers give speed but cost infra.
  • Exactly-once for billing only β†’ rest can tolerate at-least-once.
  • Driver location noise β†’ must downsample to avoid flooding Kafka.

βœ… Summary Flow Example:

  • Rider requests β†’ events.raw β†’ cleaned β†’ trip.events.
  • Driver sends GPS β†’ events.raw β†’ cleaned β†’ driver.location.
  • Matching system consumes both β†’ produces dispatch.commands.
  • When trip ends β†’ billing.events β†’ payment service.
  • All events mirrored into analytics.events for BI/ML.

Please comment Netflix’s if you want:

Design Netflix’s recommendation system with Kafka?

More Details:

Get all articles related to system design
Hashtag: SystemDesignWithZeeshanAli

systemdesignwithzeeshanali
GitHub: https://github.com/ZeeshanAli-0704/SystemDesignWithZeeshanAli

Top comments (0)