DEV Community

Mehmet TURAÇ
Mehmet TURAÇ

Posted on

Scale Wars #2 — Uber: How They Processed 100 Billion Events Per Day

Year: 2015–2020 · Crisis: Event traffic too massive for any monolith


The Problem: A "Trip Started" Signal Reaching 47 Services

When a ride starts on Uber, it's not just the "Driver" and "Rider" services that are affected. Here's what actually happens:

  1. Pricing service: Locks the fare
  2. Billing service: Prepares to generate an invoice
  3. Maps service: Starts route optimization
  4. ETA service: Calculates estimated arrival
  5. Payment service: Pre-authorizes the card
  6. Insurance service: Activates the policy
  7. Customer support service: Makes ticket creation available
  8. Analytics service: Starts data streaming
  9. Fraud service: Monitors for suspicious activity
  10. ...and 37 more services

If each one called the next via synchronous HTTP, a single "start trip" button would require 50 HTTP calls. If one slows down, the whole system slows. If one crashes, everything crashes.

Architectural Decision: Event-Driven Architecture with Kafka

Uber built a massive event-driven architecture on top of Apache Kafka. Every state change is published as an event. Interested services subscribe to it.

┌─────────────────────────────────────────────────────────┐
│  DRIVER APP: "I started the trip" (trip_id: 12345)      │
└─────────────┬───────────────────────────────────────────┘
              │
              ▼
┌─────────────────────────────────────────────────────────┐
│  KAFKA BROKER — topic: "trip-events"                    │
│  {                                                       │
│    "event_type": "TRIP_STARTED",                        │
│    "trip_id": "12345",                                  │
│    "driver_id": "drv_789",                              │
│    "rider_id": "rdr_456",                               │
│    "timestamp": 1705244400000,                          │
│    "pickup_location": { "lat": 40.7128, "lng": -74.0060 }│
│  }                                                       │
└─────────────┬───────────────────────────────────────────┘
              │
   ┌──────────┼──────────┬──────────┬──────────┐
   ▼          ▼          ▼          ▼          ▼
[Billing]  [Pricing]  [Analytics]  [Insurance]  [Fraud]
 (async)    (async)     (async)      (async)     (async)
Enter fullscreen mode Exit fullscreen mode

Schema Registry: The Key to Preventing Chaos

With thousands of event types, thousands of producers and consumers on Kafka — if Billing expects amount as a number while Pricing sends it as a string, Billing silently breaks.

Uber's solution: Schema Registry (developed by Confluent).

// Avro Schema  for trip-events
{
  "type": "record",
  "name": "TripStartedEvent",
  "namespace": "com.uber.events",
  "fields": [
    { "name": "event_id", "type": "string" },
    { "name": "trip_id", "type": "string" },
    { "name": "driver_id", "type": "string" },
    { "name": "rider_id", "type": "string" },
    { "name": "timestamp", "type": "long" },
    { "name": "pickup_location", "type": {
        "type": "record",
        "name": "GeoPoint",
        "fields": [
          { "name": "lat", "type": "double" },
          { "name": "lng", "type": "double" }
        ]
    }}
  ]
}
Enter fullscreen mode Exit fullscreen mode

Rules:

  • Every event type has a versioned schema
  • Producers register with the Schema Registry before publishing events
  • Consumers deserialize events according to the schema
  • Non-backward-compatible changes are REJECTED

This way, when one service changes its schema, it doesn't break the other 47 services.

Uber's "Domain Gateway" Architecture

Uber grouped its microservices around domains:

  • Rider Domain: All rider-related services
  • Driver Domain: All driver-related services
  • Trip Domain: All trip-related services
  • Payment Domain: All payment-related services

Each domain has a Domain Gateway — the single entry point to the outside world.

┌──────────────────────────────────────────┐
│           TRIP DOMAIN GATEWAY            │
│  (trip.uber.com — single entry point)    │
└──────┬───────────────────────────────────┘
       │
       ├── /trip-service       (trip CRUD)
       ├── /eta-service        (estimated arrival)
       ├── /route-service      (route optimization)
       └── /dispatch-service   (driver matching)
Enter fullscreen mode Exit fullscreen mode

Advantages:

  • External services don't know about internal domain details
  • Services within a domain can be freely refactored
  • Rate limiting, auth, and caching are managed centrally at the gateway

Schemaless DB: Uber's Database Revolution

Uber initially used PostgreSQL. But as they grew, vertical scaling wasn't enough. PostgreSQL's sharding capabilities were insufficient for horizontal scaling.

Uber developed Schemaless, their own storage layer. It was built on MySQL but used MySQL as a "key-value store":

-- Schemaless's simple but powerful schema
CREATE TABLE entity (
    uuid        BINARY(16) PRIMARY KEY,
    body        MEDIUMBLOB,        -- All data here, as JSON
    entity_type VARCHAR(64),
    created_at  TIMESTAMP,
    updated_at  TIMESTAMP,
    KEY (entity_type, created_at)
);
Enter fullscreen mode Exit fullscreen mode

Why?

  • MySQL has strong transaction and replication capabilities
  • But schema changes (ALTER TABLE) are extremely slow on large tables
  • Schemaless moved the schema to the application layer, using MySQL purely as a storage engine

This architecture allowed Uber to store trillions of entities.

Trade-offs

Gains:

  • Loose coupling: One service crashing doesn't affect the rest
  • Scalability: Each service scales independently
  • Development speed: Teams can ship without waiting for each other

Costs:

  • Eventual consistency: "How many active trips are in the system right now?" doesn't always have a clear answer
  • Debugging difficulty: Finding why an event wasn't processed means digging through Kafka, Schema Registry, and consumer logs
  • Data duplication: Each service maintains its own data → duplication and synchronization challenges

🛠️ Takeaways

Adding Kafka to a small project is like using a sledgehammer to drive a nail — but at scale, it becomes essential. Without schema management in event-driven systems, chaos is inevitable; Schema Registry or similar tools are a must. Uber's Domain Gateway approach is a textbook application of Conway's Law (org structure = system architecture). And Uber's Schemaless is living proof that "one database can't do everything."


Next up — Chapter 3: Amazon's API Mandate and the memo that changed everything. 📦

Top comments (0)