This article was written with the assistance of AI tooling for structure and syntax. The concepts, tradeoffs, and production context are based on my own engineering experience and research. #ABotWroteThis
You're building the "find nearby drivers" feature for a ride-hailing app.
At peak, you have 500,000 active drivers updating their GPS location every 5 seconds. Riders query for drivers within 2km. At scale, you're doing ~100,000 proximity queries per second.
Your naive implementation does this:
SELECT * FROM drivers
WHERE lat BETWEEN ? AND ?
AND lng BETWEEN ? AND ?
It works fine at 1,000 drivers. At 500,000, it's a full table scan on every query. Latency hits 800ms. Riders see a spinner. Drivers miss trips.
Your team proposes 4 approaches to fix this:
A) Geohash partitioning — Encode each driver's location into a geohash string. Index by geohash prefix. Proximity queries become a string lookup on the index.
B) PostGIS with spatial indexes — Add a PostGIS extension to Postgres. Use a proper R-tree/GiST spatial index for bounding-box and radius queries.
C) Quadtree in memory — Keep all active driver positions in a quadtree data structure in a Redis-backed in-memory service. Decompose space recursively until each cell has ≤ N drivers.
D) H3 hexagonal grid (Uber's system) — Divide the earth into hexagonal cells at multiple resolutions. Assign each driver to a cell. Queries check the target cell + 6 neighbors at the right resolution.
You need sub-50ms p99 latency, real-time updates, and it has to stay accurate at cell boundaries.
Pick one — A, B, C, or D — and tell me why. Full breakdown in the comments.
If your team has argued about spatial indexing before, share this. Worth the debate.
Drop your answer 👇
Top comments (4)
Answer: D — H3 Hexagonal Grid ✅
Why D wins:
Uber open-sourced H3 for exactly this problem. Hexagons have 6 neighbors, all equidistant from the center. Squares (geohash) have corner neighbors that are ~40% farther — this distorts "within 2km" queries. H3 also has 16 resolution levels (no reindexing on zoom changes). The killer feature: the k-ring query (target cell + 6 neighbors) guarantees no driver gets missed near a boundary. Uber does sub-10ms lookups at millions of drivers: get cells in ring → fetch IDs from Redis set keyed by cell. Lyft, Airbnb, DoorDash all use variants.
Why A is the trap (Geohash):
Production-grade, widely used — and wrong here. Geohash uses a Z-order curve (interleaved bits) that doesn't guarantee geographic adjacency for adjacent prefix strings. Two drivers 1 meter apart can have entirely different prefix strings if they straddle a cell boundary. This becomes a systematic miss rate at 100k queries/sec. Geohash is right for coarse partitioning (sharding by region), wrong for real-time sub-km proximity.
Why B fails at scale (PostGIS):
PostGIS is correct — it's the right answer for 5,000 trucks on a fleet dashboard. Not for 500k drivers updating every 5s. The math: 500k × 12 updates/min = 100k writes/sec against a GiST index. GiST indexes are write-expensive. Your DB burns half its capacity on index maintenance. And you can't easily scale Postgres writes horizontally. PostGIS is the starting point — H3 + Redis is where you land when it breaks.
Why C almost works (Quadtree):
O(log n) insert/query, great in theory. Breaks at ride-hailing density: driver crossing a cell boundary = delete + reinsert + potential rebalancing, 100k times per second. Same rectangular boundary problem as geohash. Dense cities create deep uneven trees. A quadtree is a building block — not a complete solution at this scale.