Posted on Feb 16, 2023 • Edited on Jul 28, 2023 • Originally published at systemdesign.one

Maps System Design

#webdev #beginners #career #tutorial

You can subscribe to the system design newsletter to excel in system design interviews and software architecture. The original article was published on systemdesign.one website. The popular implementations of the navigation service are the following:

Google Maps
OpenStreetMap
Bing Maps

Disclaimer: The system design questions are subjective. This article is written based on the research I have done on the topic. Feel free to share your feedback and ask questions in the comments. Some of the linked resources are affiliates.

Requirements

The user (client) can find the nearby points of interest within a given radius
The client can find the real-time estimated time of arrival (ETA) between two locations
The client receive suggestions on the possible routes between the two locations

Data storage

Database schema

The primary entities are the locations, the places, and the media tables
The relationship between the locations and the places table is 1-to-many
The relationship between the places and the media table is 1-to-many
The spatial index using Geohash or R-tree is created on the locations table

Type of data store

SQL database such as Postgres persists the location data (latitude, longitude) and the vector tiles
The PostGIS extension of Postgres is used to generate the spatial index
The wide-column data store (LSM tree-based) such as Apache Cassandra is used to persist the trip data for high write throughput
A cache server such as Redis is used to store the popular routes and the road segments
A message queue such as Apache Kafka is used to handle heavy traffic
The message queue is also configured as the pub-sub server
NoSQL document store such as MongoDB is used to store the metadata of the road segments
The media files and raster tiles are stored in a managed object storage such as AWS S3
The CDN caches the popular tiles and the media files
Lucene-based inverted-index data store such as Apache Solr is used to store the index data of locations

Subscribe to my newsletter and never miss a new blog post again, as you'll get notified via email every time I publish something. You will also receive the ultimate guide to approaching system design interviews on newsletter sign-up.

Further system design learning resources

Are you preparing for a system design interview and feeling overwhelmed by the complexity of the process? Do you want to gain the knowledge and confidence to ace the interview and advance your career in software engineering? Then you need to check out the Grokking the System Design Interview course by DesignGurus!

This comprehensive course is designed to help you master the system design interview process by breaking down complex concepts into easy-to-understand explanations. With a focus on practical, real-world examples, you’ll learn how to design scalable, fault-tolerant, and high-performance systems that can handle the most challenging problems.

Don’t miss this opportunity to take your system design skills to the next level and achieve your career goals. Sign up for the Grokking the System Design Interview course by DesignGurus today and take the first step towards success!

High-level design

Calculation of ETA is the shortest path in a directed weighted graph problem
The graph is partitioned and the routes are precomputed for scalability

The map matching is performed to improve the accuracy of ETA
The Kalman filter is used for map matching
The cache and data stores are replicated and sharded geographically through consistent hashing to handle hotspots
Monitoring and fallback can be configured for high availability

How to find the Points of Interest?

The client executes a DNS query to resolve the domain name
The client queries the CDN to check if the nearby points of interest are cached on the CDN
The client creates a WebSocket connection on the load balancer for real-time updates
The load balancer delegates the client's connection to a server with free capacity
The server queries the trie server to provide autocompletion functionality
The server delegates the search query of the client to the search service
The search service queries the inverted index data store to find the relevant locations
The server delegates the points of interest query from the client to the location service
The location service calculates the Geohash from the request and queries the location cache to fetch the list of nearby points of interest IDs
The location database (replica) is queried on a cache miss
The location service queries the places cache to fetch the metadata of a place
The places data store (replica) is queried on a cache miss
The location service queries the object store to fetch the relevant media files and the raster tiles
The tile server on the location service provides SQL functionality on vector tiles
LRU policy is used to evict the cache
GPS signals are not accurate for the identification of the static location due to the multipath effect (reflection from a building)
The WiFi access point is used in addition to GPS signals to find the static location of a client accurately
The map is displayed on a web browser by placing multiple smaller vector tiles and raster tiles next to each other to make up a single bigger picture
The raster tiles (png, jpg) are useful for off-road trajectories to speed up the loading of detected map errors
Vector tiles allow the reduction of bandwidth by restricting data transfer to the current viewport and zoom level
Vector tiles allow to apply styling on the fly on the client and thereby reduce the server load
Vector tiles transfer the vector information (path and polygon) instead of transferring the image tiles
Vector information is grouped into tiles for caching and partitioning
The tiles can be pre-generated and cached to improve latency
Each zoom level shows a unique set of tiles with a different granularity of information
The number of tiles doubles when the user zooms out
Geohash is used to identify the tile on the map
The viewport of the client encloses the container that assigns absolute positions to the tiles
The container is moved when the user pans the map
Street view and satellite imagery are used to identify places in the world

How to calculate the Estimated Time of Arrival?

The client executes a DNS query to resolve the domain name
The client queries the CDN to check if the requested route is cached on the CDN
The client creates a WebSocket connection on the load balancer for real-time bi-directional communication on ETA, trip, and traffic data
The load balancer delegates the client's request to a server with free capacity
The server queries the route cache to fetch the list of possible routes to the destination
The route database (replica) is queried on a cache miss
The server queries the road segment cache to fetch the metadata of the road segments on the suggested route
The road segment data store (replica) is queried on a cache miss
The server queries the trip data store to recalculate the real-time ETA
The server queries the map-matching service to calculate the ETA accurately
The map-matching service queries the trip data store to fetch the list of recent GPS signals
The map matching service queries the road segment data store to identify the nearest road segments
The weather service publishes the latest weather data on the pub-sub server
The traffic service (HyperLogLog based) publishes the latest traffic data on the pub-sub server
The server fetches the latest weather and traffic data using the publish-subscribe pattern
The server queries the route ranking service to filter and sort the result
The server publishes the GPS signals on the message queue for asynchronous processing
Apache Spark executes jobs on the trip data to detect changes in roads, hot spots, and traffic patterns
Apache Spark job asynchronously updates the route database with any detected changes
The physical map is represented as a graph with road segments as the directed edges and road intersections as the nodes
The route between the origin and destination on the road network should be calculated in the least cost (function of time and distance)
Map matching is the problem of matching the raw GPS signals to the relevant road segments
The embedded sensors on the smartphones are used to geo-localize the users
The graph can be partitioned to pre-compute the possible routes within the partitions with time complexity of sqrt(n), where n = number of road intersections
The partitions can be created on different granularities (country, city) to improve the latency of long-distance queries
The interactions are applied on the boundaries of the partitions to improve the accuracy and scalability
Dijkstra's algorithm (sub-optimal due to scalability) provides a time complexity of O(nlogn) for route calculation, where n = number of road intersections
The shortest path algorithm on the weighted graph such as the Bellman-Ford provides a time complexity of O(ve) for route calculation, where v = number of vertices and e = number of edges
The edge weight of the graph is populated with weather and traffic data
The popular partitions and routes can be cached to improve the latency
Alternate routes can be suggested if applying the weather and traffic data on the shortest distanced route returns a delayed ETA
The time series data (seasonality), map data, routing, and machine learning can be used to predict the ETA
Viterbi algorithm (dynamic programming) can be used to identify the most probable sequence of road segments using a sequence of GPS signals
Alternatively, the KNN algorithm with a minimum radius (k = 2) can be executed on the Geohash to identify the closest road segments
The marginalized particle filter running on an unscented Kalman filter is used for real-time map matching
Kalman filter is an optimal algorithm that uses multiple GPS signals to estimate the accurate road segment of the client
Kalman filter is also applied to the seasonality of the time series data
Additional data types such as traffic and weather can be added to marginalized particle filter
Machine learning on the time series historical data helps to predict traffic conditions

Support

If you enjoy the blog and would like to support my work, you can make a one-time donation on Ko-fi or Buy Me a Coffee or become a Patron on Patreon. Your support will help me continue to create quality content and bring new ideas to the blog. Thank you for your generosity!

License

CC BY-NC-ND: This license allows reusers to copy and distribute the images in this article in any medium or format in unadapted form only, for noncommercial purposes only, and only so long as attribution is given to the creator. The original article must be backlinked.

References

Antin Harasymiv, Prototyping a Smoother Map (2018), medium.com
wiki.openstreetmap.org
Sreeta Gorripaty, What's My ETA? (2018), Uber Engineering
Karan Parikh, Map Matching (2018), Uber Engineering

Forem

Maps System Design

Requirements

Data storage

Newsletter

Further system design learning resources

High-level design

How to find the Points of Interest?

How to calculate the Estimated Time of Arrival?

Newsletter

Support

License

References

Top comments (0)

Read next

Finally got some time to play with the new JSONata and Variables support for Step Functions, and I have to say, it is massive improvement. Check out my latest blog post, where I walk through a simple example of how easy it is to handle pagination now

What I'd Like to Learn in 2025

8 Modern Developer Tools that Will 10X Your Productivity 🔥🚀

Building Production-Grade Web Applications with Supabase – Part 2