shubham pandey (Connoisseur)

Posted on Mar 19

Designing Uber / Ride Sharing at Scale: deep dive

#architecture #distributedsystems #interview #systemdesign

Introduction

Uber seems simple on the surface — request a ride, a driver shows up. But underneath it is one of the most complex real time distributed systems ever built. Real time location tracking, intelligent driver matching, dynamic surge pricing, and accurate fare calculation all happening simultaneously at massive scale. This post walks through every challenge question by question, including wrong turns and how to navigate out of them.

Challenge 1: Real Time Driver Location Tracking

Interview Question: Drivers are constantly moving and their location changes every few seconds. How do you design a system that collects and stores driver locations in real time — and how frequently should a driver app send its location to your servers?

Navigation: The tradeoff is between location accuracy and server load. Sending every 100ms is too frequent and wastes battery and bandwidth. Sending every 30 seconds is too infrequent for a real time map. The middle ground is adaptive frequency based on trip phase.

Solution: Kafka stream for location events with adaptive GPS frequency.

Driver online no trip — GPS every 5 seconds
Driver approaching rider — GPS every 2 to 3 seconds
Trip in progress — GPS every 1 to 2 seconds
Driver stationary at red light — reduce to every 5 seconds automatically
Driver moving fast on highway — increase to every 1 second automatically

Driver app emits location events to Kafka. Location Service consumes from Kafka and updates driver position in storage layer.

Key Insight: Adaptive GPS frequency based on trip phase and speed balances accuracy with battery life and server load. One size does not fit all phases of a trip.

Challenge 2: Storing and Querying Driver Locations

Interview Question: 5 million active drivers send location updates every 2 seconds. That is 2.5 million location writes per second. You only ever need the most recent location — old locations are immediately irrelevant. Does a traditional database make sense here?

Wrong Approach: Store driver locations in a traditional database like PostgreSQL or DynamoDB.

Why It Fails: Traditional databases are optimized for persistent storage with complex queries. Driver locations are ephemeral — they change every 2 seconds and old values have zero value. 2.5 million writes per second on a traditional database is extremely heavy and unnecessary for data that expires immediately.

Navigation: You need extremely fast reads and writes for data that does not need to persist permanently. Redis is the perfect fit — in memory, sub millisecond reads and writes.

Solution: Redis GEO commands for location storage and proximity search.

Driver location update:
GEOADD drivers longitude latitude driverID

Find all drivers within 2km of rider:
GEORADIUS drivers rider_longitude rider_latitude 2 km

Under the hood Redis GEO uses a Sorted Set — it converts latitude and longitude into a special score called a Geohash. This enables O(log n) proximity queries across millions of driver locations.

Full location pipeline:

Driver app sends location every 2 seconds
Kafka receives location event
Location Service consumes from Kafka
Updates Redis GEO with latest driver position
Old position automatically overwritten — no cleanup needed
Rider opens app — GEORADIUS query returns nearby drivers instantly

Key Insight: Redis GEO is purpose built for real time location storage and proximity queries. It replaces an entire geospatial database with two commands — GEOADD and GEORADIUS.

Challenge 3: Pushing Driver Locations to Rider App

Interview Question: A rider has the app open and sees drivers moving on the map in real time. Those locations update every 2 seconds. How does the rider app get continuous location updates — polling or push?

Wrong Approach: Polling since Redis reads are fast and have no downside.

Why It Fails: Even with fast Redis reads, 100 million riders polling every 2 seconds generates 50 million HTTP requests per second. Each request has HTTP overhead — headers, connection setup, authentication. Most responses are nearly identical since drivers barely move in 2 seconds. The network cost is enormous even if Redis responds instantly.

Navigation: The smarter approach is only pushing updates when a driver actually moves significantly — delta updates. This requires a persistent connection rather than repeated polling.

Solution: Hybrid approach based on trip phase.

Phase 1 browsing before booking:

Simple polling every 5 seconds
Rider just needs approximate driver positions
Slight staleness is acceptable
No persistent connection needed for 100 million casual browsers

Phase 2 driver matched and approaching:

WebSocket connection opened at moment of booking confirmation
Server pushes driver location updates in real time
Rider tracks their specific driver with precision

Phase 3 trip in progress:

WebSocket connection maintained
Highly accurate real time location updates
Both rider and driver tracked simultaneously

Key Insight: Do not over-engineer connections until they are actually needed. Polling is acceptable for casual browsing. WebSocket is reserved for the moment precision and real time tracking genuinely matter.

Challenge 4: Intelligent Driver Matching

Interview Question: Rider confirms booking. There are 500 available drivers within 2km. Uber needs to pick the optimal driver considering distance, rating, availability, and acceptance rate. The naive approach queries 500 drivers from Redis then does 500 individual database lookups for each driver's metadata. At 1 million ride requests per day that is 500 million database queries at peak. What is wrong with this?

Real World Observation: Uber actually sends alert to closest driver first and if they do not accept moves to the next nearest driver.

Why Pure Sequential Is Too Slow: Driver 1 has 15 seconds to accept. Driver 1 ignores it — 15 seconds wasted. Driver 2 also ignores — another 15 seconds wasted. Rider has been waiting 30 seconds with no driver assigned. In a city with low driver availability this cascading sequential approach means riders wait minutes just for assignment.

Why Notifying All 500 Is Also Wrong: Multiple drivers accept simultaneously causing race conditions. 497 drivers receive notifications for nothing — wasted alerts and poor driver experience.

Solution: Batch notifications with distributed Redis lock.

Query Redis GEORADIUS — get 500 nearby drivers sorted by distance
Send notification to first batch of 10 closest drivers simultaneously
First driver to accept acquires distributed lock on that ride request using Redis SETNX
SETNX is atomic — if two drivers accept simultaneously only one gets the lock
Lock acquired — ride assigned — all other notifications cancelled
Nobody accepts in 15 seconds — send to next batch of 20 drivers expanding radius
Repeat until driver found

Redis SETNX for atomic locking:
SETNX rideID driverID — returns 1 if lock acquired, 0 if already taken

TTL on the lock — 15 seconds matching driver acceptance window:

Server crashes after lock acquired — lock automatically expires after 15 seconds
Next batch gets notified — fresh lock available
No stuck locks, no riders waiting forever

Key Insight: Batch notifications with Redis atomic locking balances speed with fairness. TTL prevents deadlocks from server crashes — the same pattern that saved us in WhatsApp and Twitter designs.

Challenge 5: Surge Pricing

Interview Question: Uber's surge pricing multiplies fares during peak demand. It is calculated based on supply and demand ratio per geographic zone in real time. How do you compute this ratio across thousands of zones simultaneously as riders request and drivers come online?

Solution: Kafka event streaming with Redis Sorted Set sliding window per zone.

Every zone has two Redis Sorted Sets:

zone5:requests — ride requests with Unix timestamp as score
zone5:drivers — available drivers with Unix timestamp as score

When rider requests ride in Zone 5:
ZADD zone5:requests current_timestamp unique_request_id

When driver comes online in Zone 5:
ZADD zone5:drivers current_timestamp driver_id

Every minute Surge Pricing Service:

ZREMRANGEBYSCORE zone5:requests 0 ten_minutes_ago_timestamp — removes stale requests
ZREMRANGEBYSCORE zone5:drivers 0 ten_minutes_ago_timestamp — removes stale drivers
ZCARD zone5:requests — count of active requests in last 10 minutes
ZCARD zone5:drivers — count of available drivers in last 10 minutes
Surge ratio = requests divided by drivers
Updates Redis with surge multiplier for Zone 5

Rider requests ride — reads surge multiplier from Redis instantly. All computation happens asynchronously via Kafka — never blocks the main ride request flow.

Key Insight: The sliding time window pattern using timestamp as score and ZREMRANGEBYSCORE for expiry — identical to Twitter trending topics — applies perfectly to surge pricing. Events older than 10 minutes automatically stop contributing to the surge calculation.

Challenge 6: GPS Coordinate Collection Strategy

Interview Question: A 30 minute trip at GPS updates every 2 seconds generates 900 coordinates. At 10 million trips per day that is 9 billion GPS coordinates daily. How do you store trip route data efficiently?

Wrong Approach 1: Store only start and end coordinates.
Why It Fails: Straight line between start and end ignores the actual route taken. Non optimal routes and detours are completely missed. Fare calculation is inaccurate.

Wrong Approach 2: Store every single GPS coordinate.
Why It Fails: 9 billion coordinates per day at 16 bytes each is 144GB of raw GPS data daily. After one year that is 52TB of mostly redundant data.

Solution: Store coordinates at smart intervals with intelligent compression.

During trip store coordinates every 2 seconds in Redis temporarily
Map Matching Service processes coordinates in near real time
Reconstructs clean route using 20 to 30 key waypoints instead of 900 raw points
Clean reconstructed route stored permanently in database
Raw GPS coordinates discarded after processing

Smart optimizations:

Dead reckoning between GPS updates using phone accelerometer and gyroscope for smooth map animation
Encoded Polyline Algorithm compresses coordinate sequences by up to 75 percent before sending to server
Geofencing triggers immediate GPS update when driver enters airport or landmark zones regardless of interval
Stationary detection reduces frequency automatically when driver is not moving

Storage lifecycle:

Raw GPS coordinates kept temporarily in Redis during trip
Clean reconstructed route kept 90 days for dispute resolution
Raw data deleted immediately after fare calculation
Driver location history anonymized and aggregated for traffic data

Key Insight: Never store raw GPS permanently. Process immediately into clean reconstructed routes, discard raw data, and keep only the meaningful waypoints. Reduces storage by over 90 percent.

Challenge 7: GPS Gap Filling and Map Matching

Interview Question: Driver enters a tunnel and GPS drops for 90 seconds. You have a coordinate at minute 3 and the next at minute 4:30. The straight line between them cuts through buildings. How do you reconstruct the actual route taken?

Wrong Approach: Always assume the longest route for worst case scenario.

Why It Fails: Driver takes the optimal shortest route through the tunnel but gets charged for the longest possible route. Rider is overcharged for a route the driver never took. Unfair to both parties and destroys user trust.

Navigation: Always assume the most probable route — not the longest or shortest. Roads are not random. Drivers can only travel on known roads. Even with GPS gaps there are only a finite number of possible routes between two points.

Solution: Map matching with Viterbi algorithm and Google Maps fallback.

Step 1 — Road Network Graph:
Uber maintains an internal graph of every city's road network. Every intersection is a node. Every road segment is an edge with metadata — speed limit, road type, historical average speed, traffic patterns.

Step 2 — GPS Coordinate Snapping:
Raw GPS has 5 to 15 meter error even without signal drops. Every coordinate gets snapped to the nearest road segment eliminating small inaccuracies.

Step 3 — Viterbi Algorithm for Gap Filling:
When GPS drops the Viterbi algorithm finds the most probable path through the road network graph between the last known point and next known point.

It considers:

All possible routes between the two known points
Historical speed data on each road segment
Time elapsed during the gap — 90 seconds at 50kmh means roughly 1.25km travelled
Which roads are physically reachable in that time window
Historical probability of drivers choosing each route from millions of past trips

Example:

Possible routes A to B — Via Highway 65 percent probability, Via Main Street 25 percent, Via Side Streets 10 percent
Viterbi selects Via Highway as most probable route

Step 4 — Google Maps Fallback:
When internal map matching fails due to sparse road data, new roads, or construction — fall back to Google Maps Distance Matrix API or Mapbox for route reconstruction.

Why not always use Google Maps:

Google Maps API costs 5 dollars per 1000 requests
Uber does 10 million trips per day
That is 50,000 dollars per day just for fare calculation
Internal map matching costs a fraction of that
Google Maps is only the fallback for edge cases

Step 5 — Final Fare Calculation:
Total distance equals sum of all road segments travelled. Fare equals base fare plus distance rate multiplied by total kilometers plus time rate multiplied by total minutes.

Key Insight: Raw GPS is never fully trusted. Map matching snaps coordinates to known roads, fills gaps using historical probability via the Viterbi algorithm, and falls back to Google Maps only when internal matching fails. Most probable route — not longest, not shortest.

Full Architecture Summary

Driver location tracking — Kafka stream with adaptive GPS frequency
Location storage — Redis GEO with GEOADD and GEORADIUS
Rider map updates Phase 1 — Polling every 5 seconds
Rider map updates Phase 2 and 3 — WebSocket on booking confirmation
Driver matching — Batch notifications with Redis SETNX lock and TTL
Surge pricing — Kafka events with Redis Sorted Set sliding window per zone
GPS coordinate storage — Temporary Redis then clean reconstructed route in database
GPS gap filling — Map matching with Viterbi algorithm and Google Maps fallback
Fare calculation — Reconstructed route distance plus time based pricing

Final Thoughts

Uber is a masterclass in combining real time systems with intelligent data processing. Every feature that feels instant and accurate to the rider — live driver locations, fast matching, fair fares — is backed by a carefully orchestrated pipeline of Kafka streams, Redis data structures, and probabilistic algorithms working together seamlessly.

The recurring theme across every challenge is that naive approaches collapse at scale and the right solution always involves pushing work to the right layer — Kafka for async processing, Redis for real time state, databases for durable history, and smart algorithms for filling in what sensors miss.

Happy building. 🚀

DEV Community