When working with location data, one problem shows up almost immediately:
“How do I calculate the distance between two coordinates stored in my database?”
At first, it seems like something you’d have to handle outside the database.
But if you're using ClickHouse, there’s a built-in function for this.
The Right Tool: greatCircleDistance
greatCircleDistance(lat1, lon1, lat2, lon2)
It calculates the shortest distance between two points on Earth.
Example
SELECT greatCircleDistance(13.0827, 80.2707, 12.9716, 77.5946) AS distance_meters;
This gives you the distance between Chennai and Bangalore - in meters.
Looks Simple… But There’s a Catch
Now let’s say you write a query like this:
SELECT city
FROM locations
WHERE greatCircleDistance(lat, lon, 13.0827, 80.2707) < 5000;
At first glance, this looks perfectly fine.
But this can quietly turn into a full table scan - especially on large datasets.
Why This Happens
In ClickHouse, indexes don’t work like traditional B-tree indexes.
They are:
- Sparse
- Designed for range pruning
They work well for queries like:
WHERE lat BETWEEN x AND y
But not for:
WHERE greatCircleDistance(lat, lon, x, y) < 5000
Because:
The function is applied on the columns, so ClickHouse cannot use the index to skip data efficiently.
The Better Approach (What You Should Actually Do)
Instead of directly applying the function, reduce the dataset first.
Bounding Box Filter
SELECT city
FROM locations
WHERE lat BETWEEN (13.0827 - 0.05) AND (13.0827 + 0.05)
AND lon BETWEEN (80.2707 - 0.05) AND (80.2707 + 0.05)
AND greatCircleDistance(lat, lon, 13.0827, 80.2707) < 5000;
(The bounding box is an approximation to reduce the search space before exact filtering.)
Why This Works
-
lat BETWEEN→ uses index -
lon BETWEEN→ reduces rows further -
greatCircleDistance→ applied only on filtered data
So instead of scanning the entire table:
=> You narrow it down first, then compute accurately
Real-World Use Cases
This pattern is useful in:
- Delivery radius filtering
- Finding nearby users
- Geo-based analytics
- Ride-sharing systems
One Important Gotcha
Make sure:
- Coordinates are in degrees (not radians)
- Order is always
(lat, lon)
Swapping them will give incorrect results.
Final Thoughts
greatCircleDistance is powerful - but if used blindly, it can hurt performance.
In ClickHouse, performance often depends more on how you query than what you query.
Sometimes, the right approach isn’t just using a function - but knowing when and how to use it efficiently.
Top comments (0)