Ayush Kumar Yadav

Posted on Jun 3

The Problem with Traditional Indexes and Spatial Queries

#database #systemdesign

THE PROBLEM

Let's say you're building an app called "Tomato" to help people find killer Indian Restaurants within 10km (or 6.2 miles) of them.

Normally, your first instinct is to throw a standard B-Tree index on your latitude and longitude coordinates.

CREATE TABLE restaurants (
    id        SERIAL PRIMARY KEY,
    name      VARCHAR(255),
    latitude  DECIMAL(10, 8),
    longitude DECIMAL(11, 8)
);

CREATE INDEX idx_lat ON restaurants(latitude);
CREATE INDEX idx_lng ON restaurants(longitude);

Looks totally fine, right?

But here's where it all falls apart. When you try to run a query to get restaurants within 10km of a specific lat and long, you're going to hit a wall.

-- 10 km radius. Conversions:
--   1° latitude  ≈ 111 km                 → 10 / 111             ≈ 0.0901° lat
--   1° longitude ≈ 111 km × cos(37.77°)   → 10 / (111 × 0.7906)  ≈ 0.1140° lng

SELECT id, name, latitude, longitude
FROM restaurants
WHERE latitude  BETWEEN 37.7749 - 0.0901 AND 37.7749 + 0.0901   -- 37.6848 .. 37.8650
  AND longitude BETWEEN -122.4194 - 0.1140 AND -122.4194 + 0.1140; -- -122.5334 .. -122.3054

You'll probably attempt a range scan on one of the indexes. If it grabs the latitude first, the index does its thing and spits out restaurants between the two latitudes.

But here's the catch: that latitude query returns a massive, wrap-around-the-globe strip of results, bringing in restaurants from Mexico, Dubai, and India all at once.

And as for longitude? It doesn't even do a range query. It drops to a painfully slow per-candidate scan and filter.

Even if you try to get clever and force an index intersection, the database still has to sweat through merging two giant sets of results.

And the best part.... Even if you pull off that intersection perfectly, you get a rectangle, not a true 10km radius circle. You still have to write extra math logic to shave off the corners and filter out the noise.

The core issue is that slapping B-Tree indexes directly onto geo-spatial data just doesn't work. They process the coordinates completely independently and have zero understanding of the actual 2D relationship between them.

THE SOLUTION

Every solid fix for this problem boils down to one simple idea: keep points that are physically close together close together in the index. Here are the three most popular ways to pull that off:

1. Geohash

Think of a Geohash as a zip code for literally anywhere on Earth. It flattens a 2D coordinate into a short string of text. It's designed so that places near each other start with the exact same letters. Since it's just a regular string, any basic database can index and range-scan it—no fancy plugins required.

Example: A café and a bookshop on the same block in Bengaluru might both hash to tdr1vfp. Finding neighbors is as easy as asking the database, "give me everything starting with tdr1v."
The Catch: Two spots could be a few meters apart but fall on opposite sides of a grid line. When that happens, they won't share much of a prefix, so your code has to know to peek into neighboring grid cells just in case.

Links:

2. Quadtree

Imagine taking a map and cutting it into four equal squares. You only cut a square into four smaller squares if it gets too crowded with points. An empty desert stays as one giant square, while a packed city center gets chopped up into tiny ones. Detail scales with density. To find what's nearby, you traverse down to the right square and check its immediate neighbors.

Example: Indexing pins on a map where downtown subdivides like crazy, but the middle of the ocean is just a single cell.
The Catch: This isn't something you can easily hack onto a standard database index. It requires a purpose-built tree structure, so nowadays, you'll see it more in tutorials and map-tiling systems than as a default production database index.

Links:

3. R-tree

Instead of forcing a rigid grid over the world, R-trees draw loose bounding boxes snugly around natural clusters of things. Then, it groups those boxes into bigger boxes, like a set of nesting dolls. When you run a search, it only opens the boxes that actually overlap your search area and completely ignores the rest. Empty space literally costs nothing. This is the heavy hitter that PostGIS and MySQL use under the hood for spatial indexing.

Example: Grouping all the restaurants in Indiranagar into one rectangle, and all the ones in Koramangala into another.
The Catch: Writes are heavy. When you insert new points, the database sometimes has to split and rebalance the rectangles. Also, because boxes can overlap, searches occasionally have to check more branches than you'd like.

Links:

TL;DR

The Problem: Standard database indexes (B-Trees) are terrible at handling maps. They process latitude and longitude separately, so asking for a 10km radius usually hands you a useless, globe-spanning strip of data instead of a neat local circle.

The Fix: You need a spatial index. They actually understand 2D space and keep physically close locations stored close together in the database. Here are the big three:

Geohash: Flattens coordinates into a short text string, acting like a zip code. If two spots share the same starting letters, they're close. It's incredibly easy to drop into any standard database.
Quadtree: Chops the map into four squares, and keeps subdividing squares only where things get crowded. It's awesome for scaling detail based on density, but it requires a custom setup rather than a standard index.
R-tree: Draws snug, nesting boxes around natural clusters of points and completely ignores empty space. It's the heavy-hitting industry standard (used in PostGIS and MySQL), though inserting new data can be a bit heavy on database writes.

DEV Community

The Problem with Traditional Indexes and Spatial Queries

THE PROBLEM

THE SOLUTION

1. Geohash

2. Quadtree

3. R-tree

TL;DR

Top comments (0)