ClickHouse® Geospatial Functions for Location Data
Location data powers many of today's most widely used applications, from ride-sharing and food delivery platforms to logistics, fleet management, mapping services, and location-based analytics. As businesses collect massive amounts of geographical data, efficiently analyzing latitude and longitude coordinates becomes increasingly important.
ClickHouse® includes built-in geospatial functions that allow you to calculate distances, generate geohashes, perform spatial analysis, and work with geographical coordinates directly in SQL. This eliminates the need to export data to external GIS tools while taking advantage of ClickHouse®'s high-performance analytical engine.
In this guide, we'll explore the most commonly used ClickHouse® geospatial functions with practical SQL examples and real-world use cases.
What Are Geospatial Functions?
Geospatial functions perform calculations using geographical coordinates such as latitude and longitude.
These functions help answer questions like:
- How far apart are two locations?
- Which customers are within a specific radius?
- Which delivery driver is nearest?
- How can nearby locations be grouped?
- How can location data be indexed efficiently?
These capabilities are widely used across mapping applications, transportation, logistics, business intelligence, IoT platforms, and location-aware services.
Sample Table
We'll use the following table throughout this guide.
CREATE TABLE delivery_locations
(
DriverID UInt32,
DriverName String,
Latitude Float64,
Longitude Float64
)
ENGINE = MergeTree
ORDER BY DriverID;
Sample Data
| DriverID | DriverName | Latitude | Longitude |
|---|---|---|---|
| 1 | Alice | 13.0827 | 80.2707 |
| 2 | Bob | 12.9716 | 77.5946 |
| 3 | Charlie | 17.3850 | 78.4867 |
Why Use Geospatial Functions?
Instead of exporting location data into specialized GIS software, ClickHouse® enables geographical analysis directly within SQL.
This approach provides several benefits:
- Faster analytical queries
- Real-time location processing
- Reduced data movement
- Simpler analytical workflows
- Better scalability for large datasets
Common ClickHouse® Geospatial Functions
1. greatCircleDistance()
The greatCircleDistance() function calculates the shortest distance between two locations on the Earth's surface using a spherical Earth model.
Suppose a customer is located in Bengaluru. The following query calculates the distance from each driver to the customer's location.
SELECT
DriverName,
greatCircleDistance(
Longitude,
Latitude,
77.5946,
12.9716
) AS DistanceInMeters
FROM delivery_locations;
Output
| Driver | Distance (m) |
|---|---|
| Bob | 0 |
| Alice | 290172 |
| Charlie | 499786 |
Common Use Cases
- Finding the nearest driver
- Delivery distance calculations
- Fleet management
- Nearby store searches
2. geoDistance()
The geoDistance() function calculates distance using the Earth's ellipsoid model, providing greater accuracy than greatCircleDistance().
SELECT
DriverName,
geoDistance(
Longitude,
Latitude,
77.5946,
12.9716
) AS Distance
FROM delivery_locations;
Output
| Driver | Distance (m) |
|---|---|
| Bob | 0 |
| Alice | 290105 |
| Charlie | 499701 |
Common Use Cases
- GPS applications
- Navigation systems
- Logistics
- Route optimization
greatCircleDistance() vs geoDistance()
| Function | Earth Model | Accuracy | Best For |
|---|---|---|---|
greatCircleDistance() |
Sphere | High | General analytics |
geoDistance() |
Ellipsoid | Very High | GPS and navigation |
If your application requires maximum geographical accuracy, such as navigation or mapping, geoDistance() is generally the preferred option.
3. geohashEncode()
The geohashEncode() function converts latitude and longitude coordinates into a Geohash string.
SELECT
DriverName,
geohashEncode(
Longitude,
Latitude,
8
) AS Geohash
FROM delivery_locations;
Output
| Driver | Geohash |
|---|---|
| Alice | tdn4q6nt |
| Bob | tdr1v9qh |
| Charlie | tephh0ws |
Nearby locations often generate similar Geohash values, making Geohashes extremely useful for grouping nearby locations and building efficient spatial indexes.
Common Use Cases
- Spatial indexing
- Nearby location searches
- Location clustering
- Efficient geographic partitioning
4. geohashDecode()
The geohashDecode() function converts a Geohash string back into its corresponding latitude and longitude coordinates.
SELECT
geohashDecode('tdr1v9qh');
Output
| Latitude | Longitude |
|---|---|
| 12.9716 | 77.5946 |
5. pointInPolygon()
The pointInPolygon() function determines whether a geographical point lies inside a specified polygon.
It returns:
-
1if the point is inside the polygon -
0if the point is outside the polygon
Suppose we want to verify whether each driver falls inside a predefined delivery zone.
SELECT
DriverName,
pointInPolygon(
(Longitude, Latitude),
[
(80.20, 13.00),
(80.35, 13.00),
(80.35, 13.15),
(80.20, 13.15)
]
) AS IsInsideZone
FROM delivery_locations;
Output
| Driver | IsInsideZone |
|---|---|
| Alice | 1 |
| Bob | 0 |
| Charlie | 0 |
Explanation
-
(Longitude, Latitude)represents the driver's location. - The list of coordinate pairs defines the delivery zone.
-
1indicates the driver is inside the polygon. -
0indicates the driver is outside the polygon.
Common Use Cases
- Geofencing
- Delivery zone validation
- Store coverage analysis
- Service area verification
- Asset tracking
6. Finding Nearby Drivers
The following example finds drivers located within a 300-kilometer radius of Bengaluru.
SELECT *
FROM
(
SELECT
DriverName,
greatCircleDistance(
Longitude,
Latitude,
77.5946,
12.9716
) AS Distance
FROM delivery_locations
)
WHERE Distance <= 300000;
Output
| Driver | Distance (m) |
|---|---|
| Bob | 0 |
| Alice | 290172 |
7. Sorting Locations by Distance
You can easily rank locations from nearest to farthest.
SELECT
DriverName,
greatCircleDistance(
Longitude,
Latitude,
77.5946,
12.9716
) AS Distance
FROM delivery_locations
ORDER BY Distance;
Output
| Driver | Distance (m) |
|---|---|
| Bob | 0 |
| Alice | 290172 |
| Charlie | 499786 |
Real-World Applications
ClickHouse® geospatial functions support a wide variety of location-based applications.
| Industry | Example |
|---|---|
| Ride Sharing | Find the nearest driver |
| Logistics | Delivery route optimization |
| Food Delivery | Locate nearby restaurants |
| Retail | Store locator |
| Healthcare | Nearest hospital search |
| IoT | Device location tracking |
| Fleet Management | Vehicle monitoring |
Performance Tips
To maximize performance when working with geographical data in ClickHouse®:
- Store coordinates using the
Float64data type. - Apply filters before calculating distances whenever possible.
- Choose efficient
ORDER BYkeys for MergeTree tables. - Use Geohashes for grouping nearby locations.
- Avoid unnecessary distance calculations across very large datasets.
- Pre-filter records using bounding boxes before running expensive distance calculations.
Conclusion
ClickHouse® provides a powerful set of built-in geospatial functions that make analyzing location data simple, efficient, and highly scalable. Whether you're calculating distances, finding nearby drivers, generating Geohashes, implementing geofencing, or building location-aware analytics, these SQL functions eliminate the need for external GIS processing.
Combined with ClickHouse®'s columnar storage engine and high-speed analytical capabilities, geospatial functions enable developers and data engineers to build scalable solutions for logistics, ride-sharing, fleet management, retail, IoT, healthcare, and many other industries that rely on geographical data.
As your datasets grow from millions to billions of records, ClickHouse® allows you to perform location-based analytics with exceptional speed while keeping your workflows entirely within SQL.
Top comments (0)