DEV Community

Cover image for A search engine for places that look alike
Pablo Rios
Pablo Rios

Posted on

A search engine for places that look alike

I open sourced Similar Earth, a tool that lets you drop a pin anywhere on the planet and find every other place that looks like it. Drop a pin on a vineyard in Mendoza, a solar farm in the Mojave, or a mangrove forest in Bangladesh, and the engine returns a global heatmap of every place on Earth that shares the same satellite signature, in about two seconds.

Three moves make it work: a coarse grid in memory, on demand refinement at high resolution, and aggressive caching.

AlphaEarth

The asset that makes the whole thing possible is AlphaEarth Foundations, a geospatial foundation model that Google DeepMind released in 2025. AlphaEarth produces a 64 dimensional embedding for every 10 meter patch of land on Earth, compressing years of satellite imagery, climate data, terrain and seasonality into a single dense representation. Two locations with similar vectors share a similar environmental signature, and finding matches across the planet is just a dot product over those 64 numbers.

What is Similar Earth

Similar Earth is an engine that builds heatmaps. The user drops one or more reference pins, and the engine returns a heatmap of the entire planet colored by how similar each location is to those pins. A pin on a coffee farm in Colombia lights a heatmap across the highlands of Ethiopia, Vietnam and Costa Rica, all the places where coffee grows for the same underlying reasons. The heatmap is the product, and everything else in the system exists to make it fast and affordable to serve at any zoom level.

The challenge

Building a heatmap at 10 meter resolution for the entire planet is impossible to do per request. Earth's land surface is around 150 million square kilometers, which at 10 meter resolution is roughly 1.5 trillion pixels, and computing a dot product against every one of those pixels on every query would take hours and cost a meaningful fraction of a dollar in compute. Precomputing every possible heatmap ahead of time is also out of the question, because the number of possible reference pin combinations is effectively infinite.

The in memory grid

The first move is to drop the global resolution from 10 meters to 2 kilometers and keep the entire grid resident in RAM. At 2 km the planet's land surface comes out to roughly 40 million pixels, and after quantizing the 64 dimensional vectors to int8 the whole grid weighs around 2.6 GB, which fits comfortably on a normal cloud instance.

// Loaded once at startup. Stays resident.
var grid [40_000_000][64]int8
Enter fullscreen mode Exit fullscreen mode

When a user drops a pin, the server computes a max pooled dot product against every land pixel in parallel:

// Simplified; real code uses a flat contiguous buffer and worker shards.
func similarity(refs [][64]int8) []int32 {
    scores := make([]int32, len(grid))
    for i, pixel := range grid {
        var best int32
        for _, ref := range refs {
            var sum int32
            for d := 0; d < 64; d++ {
                sum += int32(pixel[d]) * int32(ref[d])
            }
            if sum > best { best = sum }
        }
        scores[i] = best
    }
    return scores
}
Enter fullscreen mode Exit fullscreen mode

The whole scan takes about 2 seconds, which is acceptable for a click and wait interaction where the user expects a heatmap to render. The 2 km grid looks fine globally, but as soon as the user zooms in below city level each pixel covers an area larger than what is on screen, and the heatmap turns into a coarse staircase of squares.

Refining to 10 meters on demand

The fix was to keep the global grid at 2 km but compute 10 meter detail on demand for whatever region the user is actually looking at. This is the same idea web maps have used since Google Maps introduced the tile pyramid in 2005, and that video game engines have used for even longer to render large worlds. The pattern is called level of detail, or LOD, and the principle is that you should never compute or load anything at higher resolution than the user can actually perceive.

As soon as the user zooms past zoom level 10, the server starts fetching 10 meter embedding data for the visible region and computing the dot product locally for that region only. The result is a 256 by 256 pixel image tile that gets sent back to the browser and stitched into the heatmap alongside the coarser global data. The user sees the global view immediately and the high resolution version fills in over the next few seconds as they zoom in.

The 10 meter computation runs in a Python sidecar, called from the Go server over HTTP, because the Earth Engine SDK only exists for Python and JavaScript.

The data format the sidecar pulls from Earth Engine is a Cloud Optimized GeoTIFF, or COG, which is a raster image file laid out so that a client can read just a small region over HTTP using range requests, without downloading the whole file. The sidecar asks for the bytes corresponding to the user's viewport, runs the dot product, renders the result to a PNG, and returns it. Each on demand tile takes roughly 5 to 10 seconds end to end, with most of that spent waiting on Earth Engine rather than on the computation itself.

The disk cache

Five to ten seconds per tile is fine for the first user who looks at a region, and unacceptable for every user after that. So every 10 meter tile, once computed, gets written to disk and served from there for every subsequent request. The next person to look at the same region with the same reference pins gets the cached tile back in milliseconds.

What makes this aggressive caching safe is that the data is immutable. The featured maps in Similar Earth, such as the Hass avocado map with its 24 reference pins, are defined by fixed sets of reference embeddings, and the AlphaEarth grid underneath them does not change. The same pins always produce the same score for the same tile, forever, so no invalidation logic is needed anywhere in the system and the cache becomes a deterministic artifact of the upstream inputs.

The cache is also additive. Every region anybody zooms in on stays cached, and the more people use Similar Earth, the more of the planet has already been precomputed for everybody else.

The same precompute-everything pattern shapes the rest of the build. The pipeline that prepares each featured map is a chain of steps, each writing something the next step reads:

By the time the server boots, every featured map already has its top results computed, its tiles rendered, and its place names geocoded.

Putting it together

The whole system ends up working because each layer handles what the layer behind it cannot. The in memory grid handles any pin combination globally at 2 km, the on demand sidecar handles arbitrary regions at 10 meters, the disk cache turns those 5 to 10 second tiles into millisecond reads after the first request, and the precomputed featured maps mean that the most common entry points to the app never need any of the runtime machinery at all. The challenge that 10 meter heatmaps cannot be computed per request is still there, but very little of what users actually do hits it.

Open source

Similar Earth is live at similar.earth and the code is on GitHub. If you want to read more about how AlphaEarth itself works, I have written longer pieces in Geo Week News, freeCodeCamp and Geoawesome. Adding a new featured map is mostly a build time concern, so if there is a category of place you would like to search by, open an issue on the repo.

Top comments (0)