Amanda Fawcett for Educative

Posted on Dec 18, 2020 • Originally published at educative.io

Design Uber Backend: system design walkthrough

#datascience #systems #uber #design

In our last article on system design, we looked at the top 10 questions, including how to design a ride-sharing service like Uber or Lyft. Today, we take a deeper dive into system design questions and discuss how to design Uber’s backend.

This is a common question you can encounter in any system design interview, especially if you are interviewing for Uber. This question asks you to create a ride-sharing service to match users with drivers. Users input a destination and send their current location. Nearby drivers should be notified of new users within seconds.

Today, we will break down this question step-by-step. Let's design a ride-sharing service like Uber!

Today we will go over:

Uber Backend: Problem Overview
Uber Backend: System Design
Advanced Issues and Considerations
What to learn next

Uber Backend Problem Overview

Uber’s mission is to make transportation reliable and easy. They work with complex data, high traffic, and complex systems, all bundled up in their popular smartphone app. Uber enables customers to book drivers for cheap taxi rides. Uber drivers use personal cars, and both customers and drivers communicate using the Uber app.

There are two kinds of users that our system should account for: Drivers and Customers.

Drivers must be able to frequently notify the service regarding their current location and availability
Passengers should be able to see all nearby drivers in real-time
Customers can request a ride using a destination and pickup time.
Nearby drivers are then notified when a customer needs pick up.
Once a ride is accepted, both the driver and customer must constantly see the other’s current location the duration of the trip.
Once the drive is complete, the driver completes the ride and is now available for another customer.

Similar Services: Lyft, Didi, Via, Sidecar, etc.

Difficulty level: Hard

Helpful prerequisite: Designing Yelp

Constraints

To effectively design Uber’s backend, we need to understand the constraints, i.e. capacity estimations and system limitations. Below, we discuss some of the important considerations to make before designing the system.

These constraints below will generally differ depending on time of day and location. For the sake of this article, we will be working with these constraints and estimations:

We have 300M customers and 1M drivers in the system
We have 1M active customers per day and 500K daily active drivers
Let’s build for an assumed 1M daily rides
Let’s assume that all active drivers will notify their current location every three seconds. Once a customer puts in a ride request, the system will contact drivers in real-time.

Uber Backend: System Design

Below, we discuss the system design for this product. If you are familiar with the SDI question, Designing Yelp, this may look familiar. We will modify that solution for the above use-cases. For Uber, our QuadTree must be adapted for frequent updates.

If we use a Dynamic Grid solution from our Yelp problem, a few issues arise:

We need to update our data structures to reflect that active drivers are reporting their locations every three seconds. To update a driver to a new location, we must find the right grid based on the driver’s previous location.
If the new position does not belong to the current grid, we remove the driver from the current grid and reinsert them to the right grid. Then, if the new grid reaches a maximum limit, we have to repartition it.
We need a quick mechanism to propagate the current location of all the nearby drivers to customers in the area. Our system needs to notify both the driver and passenger on the location of the car throughout the duration of the ride.

For these cases, a QuadTree is not ideal, as a quick update in the tree cannot be guaranteed at the speeds that Uber requires. If we don’t update our QuadTree according to every driver update, it will use old data that does not reflect the current location.

We could keep the most recent driver position in a hash table and update our QuadTree less frequently. We want to guarantee that a driver’s current location is reflected in the QuadTree within 15 seconds. We maintain a hash table that will store the current driver location. We can call it DriverLocationHT.

`DriverLocationHT`

We need to store DriveIDin the hash table, which reflects a driver’s current and previous location. This means that we will need 35 bytes to store each record:

DriverID (3 bytes - 1 million drivers)
Old latitude (8 bytes)
Old longitude (8 bytes)
New latitude (8 bytes)
New longitude (8 bytes) Total = 35 bytes

As we discussed previously, we assume 1 million drivers, which will require the following memory:

*1 million * 35 bytes => 35 MB*

Now let’s discuss bandwidth. If we get the DriverID and location, it will require (3+16 => 19 bytes). This information is received every 3 seconds from 500K daily active drivers, so we receive 9.5MB per three seconds.

To help with scalability, performance, and fault tolerance, we could distribute DriverLocationHT on multiple servers based on the DriverID to randomize distribution. We will refer to the machines holding this info the Driver Location server.

These servers will also do the following:

Once the server receives an update on a driver’s location, it will broadcast that information to relevant customers.
The server will notify the respective QuadTree server to refresh the driver’s location.

Broadcasting driver locations

So, next we need to broadcast the driver’s location to our customers. We can use a Push Model where the server pushes positions to relevant users. We can use a Notification Service and build it on the publisher/subscriber model.

That way, when a customer opens the Uber app, they query the server to find nearby drivers.
On the server side, we subscribe the customer to all updates from those drivers. Whenever we have an update in DriverLocationHT for a specific driver, we broadcast the current location to all subscribed customers.

This ensures that we show the driver’s current position.
As we assumed above, we have 1M daily active customers and 500K active drivers. Let’s assume that five customers subscribe to one driver, and we store this information in a hash table for quick updates.

This means that we need to store both the driver and customer IDs. We need 3 bytes for DriverID and 8 bytes for CustomerID, so we will need 21MB of memory.

*(500K * 3) + (500K * 5 * 8 ) ~= 21 MB*

Now for bandwidth. For every active driver, we have five subscribers. In total, this reaches:

*5 * 500K => 2.5M*

We need to send DriverID (3 bytes) and their location (16 bytes) every second, which requires:

*2.5M * 19 bytes => 47.5 MB/s*

Notification service

To efficiently implement the Notification service, we can either use HTTP long polling or push notifications. Customers are subscribed to nearby drivers when they open the Uber app the first time.

So, when new drivers enter their areas, we need to add a new customer/driver subscription dynamically. To do so, we track the area that a customer is watching, but this is extra complicated.

Instead of pushing this information, we can design the system so clients pull it from the server. Clients will send their current location, so the server can find the nearby drivers from our QuadTree. The client can then update their screen to reflect the current positions of all drivers.

Clients should be able to query every five seconds. This will limit the number of round trips to the server.

When it comes to repartition, we can create a cushion so that each grid grows beyond the limit before we decide to partition it. Assume our grids can grow or shrink by an extra 10% before we partition them. This will decrease the load for a grid partition.

Request Ride use case

Let’s summarize how this use case will work below.

The customer enters a ride request
One of the Aggregator servers accepts the request and asks the QuadTree servers to return nearby drivers.
The Aggregator server collects the results and sorts them by ratings.
The Aggregator server sends a notification to the top drivers simultaneously.
The first driver to accept will be assigned that ride. The other drivers will receive a cancellation.
If the drivers do not respond, the Aggregator will request a ride from the next drivers on our list.
Once a driver accepts a request, the customer is notified.

Advanced issues and considerations

Fault Tolerance and Replication

What if a Driver Location server or Notification server dies? We will need server replicas so that a secondary server can take control when needed. We can also store data in a persistent storage like SSDs to provide fast IOs.

This way, if both our primary and secondary servers die, we can quickly recover data from the persistent storage.

Ranking

Uber also provides a ranking system for drivers, where a customer can rate a driver according to wait times, courtesy, and safety. Say we want to rank search results by popularity or relevance a well as proximity.

We need to return top rated drivers within a given radius. Assume we track the ratings of drivers in a database and QuadTree. An aggregated number will represent popularity in the system based on the star ratings.

So, while the system searches for the top 10 drivers within a given radius, we also ask each partition of the QuadTree to return the top drivers with a specified rating. The aggregator server will determine the top 10 drivers among all drivers returned by different partitions.

Other issues to discuss

How will we handle clients using slow or disconnecting networks?
What if a client gets disconnected when they are a part of a ride?
How will we handle billing if a ride if disconnected?
How can new machine learning components be implemented to improve this system?

What to learn next

Congrats! You should now have a good idea how to design Uber’s backend. We hope this was a helpful guide for your interview preparation. If you are preparing for a system design interview, there is still more to learn.

Next, you should learn how to design the following systems:

Design Ticketmaster
Microservices
Design TinyUrl
Design Twitter search
SDI basics: load balancing, caching, etc.

To help streamline your learning, Educative has curated a special learning path Scalability & System Design for Developers. As you progress through the modules, you'll cover everything you need to know to design scalable systems for enterprise-level software.

This path includes lessons on implementing microservices, using AWS architecture, and designing common systems for interviews.

Happy learning!

Continue reading about Uber and SDIs

The Next Generation Developer Platform

Coherence is the first Platform-as-a-Service you can control. Unlike "black-box" platforms that are opinionated about the infra you can deploy, Coherence is powered by CNC, the open-source IaC framework, which offers limitless customization.

Learn more

DEV Community