DEV Community

Cover image for Understanding Geohashes
Patrick Stival
Patrick Stival

Posted on

Understanding Geohashes

Hello there.

Are you curious about how geographic locations can be represented as a short string of letters and numbers? Or how this technique helps in fast retrieval of location data?

This post will take you on a deep dive into geohashes, explaining what they are, how they work, and their practical uses. Whether you're a data scientist, a software engineer, or just someone with a knack for geography and data, this post will help you grasp the concept of geohashes and their significance.

The purpose of this post is to explain in depth how a geohash works and its subtleties step-by-step.

What is a geohash?

Chat GPT says:

A geohash is a location encoding system that represents a geographic location using a short string of letters and numbers. It works by dividing the world into a grid of squares, assigning each square a unique hash. The length of the hash controls the precision, with longer hashes corresponding to smaller, more precise squares.

That gives us a general idea. You use a geohash to identify areas on a map. The easiest way to visualize them is with an interactive map
Map with precision 1 geohashes

Why geohashes

Databases are better optimized for text than geolocation queries. If you need fast retrieval over a large collection of data, R-Tree indexes might not be enough.

Quoting Chris Hewett's website:

Geohash avoids/minimises the use of (usually) slow geospacial functions. Who you going to call when your ST_CONTAINS() is slow and an EXPLAIN shows it's already using an index...

Also, geohashes can be used as an alternative to tile coordinates in order to load chunks of a map and possibly cache them on the client side.

Different precisions

As chat gpt said, a geohash length describes its precision. For instance, dr7 is a geohash with precision 3. That means that its parent, dr (precision 2), encloses a larger area that also contains its neighbors.

dr7 geohash on the map

2 types of geohashes

There are 2 types of geohash grids (width x height): 8x4 and 4x8. That means that for any given zoom level, there are 32 possible squares inside.

We start with an 8x4 grid with the whole world inside. Each time you go one precision deeper, the type of grid you get alternates. This is made this way, because if we kept grids in the same format for all zoom levels, we would eventually get super thin areas.

precision levels alternating on the map

Since there are 8x4 and 4x8 grids only, we use these 32 characters to encode each precision: 0123456789bcdefghjkmnpqrstuvwxyz.
They range from 0 to z, but beware, there are a few letters missing in this specific encoding, such as a;i;l and o.

Keep dividing by 2

If you want to find out the geohash in which a point lies into, you have to bisect it 5 times. Why 5? Because 2^5 is 32, which is the total amount of geohashes for each precision level.

Let's do a dry run of the algorithm to figure out in which geohash1 a location L in argentina lies into.

  1. purple: divide the world in 2. L is in the first half
  2. red: L is beneath
  3. orange: L is to the right
  4. yellow: L is above
  5. green: L is to the left
  6. We have run 5 tests already, so the geohash is 6 dry run illustration

Notice that we always have two possible moves for each step: to choose between the first and the second half. The moves alternate between latitude and longitude, so we are able to move in 2D.

Also, we have to somehow translate the accumulated result of those moves into an index for the string 0123456789bcdefghjkmnpqrstuvwxyz.

The rules for that are:

  • We start with an index valued 0
  • If the location lies in the first half (lower than mid), multiply the index by 2
  • If it's the second half (higher than mid), multiply the index by 2 and add 1
  • Repeat 4 more times

That means we are using even numbers to identify first-half moves and odd numbers for second-half moves.

The highest value we can get is 31:

0 * 2 + 1 = 1
1 * 2 + 1 = 3
3 * 2 + 1 = 7
7 * 2 + 1 = 15
15 * 2 + 1 = 31
Enter fullscreen mode Exit fullscreen mode

Since our index starts at 0, that would be 32 possibilities.

This is exactly what the code is doing:

// (c) Chris Veness 2014-2019
let latMin = -90, latMax = 90
let lonMin = -180, lonMax = 180

while (geohash.length < precision) {
  if (evenBit) {
    // bisect E-W longitude
    const lonMid = (lonMin + lonMax) / 2
    if (lon >= lonMid) {
      idx = idx * 2 + 1
      lonMin = lonMid
    } else {
      idx = idx * 2
      lonMax = lonMid
    }
  } else {
    // bisect N-S latitude
    const latMid = (latMin + latMax) / 2
    if (lat >= latMid) {
      idx = idx * 2 + 1
      latMin = latMid
    } else {
      idx = idx * 2
      latMax = latMid
    }
  }

  evenBit = !evenBit

  if (++bit === 5) {
    // 5 bits gives us a character: append it and start over
    geohash += base32.charAt(idx)
    bit = 0
    idx = 0
  }
}

console.log(geohash)
Enter fullscreen mode Exit fullscreen mode

Then we stop once the precision is met.

Alternatives to geohash

If you check the h3geo comparison page, you should see plenty of alternatives to geohash, such as s2 or even h3 itself.

Glossary

ST_CONTAINS: postgis function that finds geometries inside areas;
EXPLAIN: postgres statement that calculates query cost

Top comments (0)