PostGIS Spatial Indexing: Why Your Queries Are Doing Sequential Scans

#database #sql #postgres #performance

The One PostGIS Function Swap That Makes Spatial Queries 1000x Faster

I have seen this exact pattern break production at three different companies. A location-based feature ships, it works great in staging, and then it hits production with millions of rows and every "find nearby" query takes seconds instead of milliseconds.

The cause is always the same: ST_Distance in a WHERE clause.

The Function That Looks Right But Isn't

This query is logically correct. It finds all points of interest within roughly 1 km of a location:

SELECT name, ST_Distance(geom, ST_SetSRID(ST_MakePoint(-73.9857, 40.7484), 4326)) AS dist
FROM points_of_interest
WHERE ST_Distance(geom, ST_SetSRID(ST_MakePoint(-73.9857, 40.7484), 4326)) < 0.01;

It returns the right results. It also forces PostgreSQL to compute the exact distance between the query point and every single geometry in the table. On 5 million rows, that is 5 million distance calculations before any filtering happens.

The problem is that ST_Distance is not index-aware. GiST spatial indexes work on bounding-box overlap -- they answer "could these two geometries possibly be close?" very quickly by comparing rectangles. ST_Distance bypasses this entirely because it needs the exact distance, not an approximation.

The Fix: One Function Swap

SELECT name, ST_Distance(geom, ST_SetSRID(ST_MakePoint(-73.9857, 40.7484), 4326)) AS dist
FROM points_of_interest
WHERE ST_DWithin(geom, ST_SetSRID(ST_MakePoint(-73.9857, 40.7484), 4326), 0.01);

ST_DWithin returns the same rows but uses a completely different execution path. It translates to a bounding-box overlap check internally, which the GiST index can answer. The index eliminates 99.9% of candidates, then PostGIS computes exact distances only for the handful of geometries whose bounding boxes overlap the search region.

On that same 5-million-row table: sequential scan with ST_Distance takes ~3 seconds. Index scan with ST_DWithin takes ~2 milliseconds. Same result set.

But First, You Need the Index

None of this helps if the geometry column does not have a GiST index. Check whether your columns are indexed:

SELECT
    gc.f_table_name,
    gc.f_geometry_column,
    gc.type,
    gc.srid
FROM geometry_columns gc
WHERE NOT EXISTS (
    SELECT 1
    FROM pg_indexes pi
    WHERE pi.tablename = gc.f_table_name
      AND pi.indexdef LIKE '%gist%'
      AND pi.indexdef LIKE '%' || gc.f_geometry_column || '%'
);

Every row in this result is a geometry column with no spatial index. Every spatial query against it is a guaranteed sequential scan, regardless of which function you use.

Create the index:

CREATE INDEX CONCURRENTLY idx_pois_geom
    ON points_of_interest USING gist (geom);

Use CONCURRENTLY -- a normal CREATE INDEX on a large spatial table locks the table for minutes. CONCURRENTLY builds in the background.

Other Index-Aware Functions

ST_DWithin is the most common fix, but it is not the only index-aware function. These all use the GiST bounding-box operator internally:

ST_Intersects(a, b) -- do these geometries share any space?
ST_Contains(a, b) -- does geometry A fully contain B?
ST_Within(a, b) -- is geometry A fully inside B?
ST_Covers(a, b) / ST_CoveredBy(a, b) -- similar to contains/within but handles edge cases on boundaries

All of these use the && bounding-box operator as a first pass. If your query uses one of these predicates and you have a GiST index, the planner will use it.

Functions that compute scalar values -- ST_Distance, ST_Area, ST_Length -- are not index-aware when used in WHERE clauses. They need to evaluate every row.

The Pattern for Distance Queries

When you need both filtering and the actual distance value (for sorting or display), combine ST_DWithin for filtering with ST_Distance for computation:

SELECT
    name,
    ST_Distance(geom, query_point) AS distance
FROM points_of_interest
WHERE ST_DWithin(geom, query_point, 0.01)
ORDER BY distance;

The WHERE clause uses the index to narrow candidates. The SELECT list computes exact distance only for the small result set. The ORDER BY sorts the few matching rows. This is the standard pattern for "find nearest" queries in PostGIS.

How to Catch This Before Production

Add this to your CI or periodic monitoring:

SELECT query, calls, mean_exec_time
FROM pg_stat_statements
WHERE query ~* 'st_distance'
  AND query ~* 'where'
ORDER BY mean_exec_time DESC;

Any ST_Distance appearing in a WHERE clause with a high mean execution time is doing a sequential scan. The fix is always the same: replace it with ST_DWithin.

Make it a code review rule: ST_Distance in a WHERE clause is a spatial code smell. It might be correct for tiny tables, but it does not scale. ST_DWithin does.

Originally published at mydba.dev/blog/postgis-spatial-indexing