umesh kushwaha

Posted on Jan 23

How Uber Eats & Zomato Find “Restaurants Near You"

#blog #systemdesign #distributedsystems

Search & Discovery Architecture for Location-Based Systems (Deep Dive)

Finding nearby restaurants looks simple. At scale, it’s a hard distributed systems problem involving spatial indexing, ranking, freshness, and tradeoffs between accuracy and performance.

This blog deep-dives into Search & Discovery for location-based systems like:

Restaurant discovery (Zomato, Swiggy)
Store search (Blinkit, Instamart)
Nearby places (Google Maps-lite use cases)

1️⃣ Problem Definition

Inputs

User location (latitude, longitude)
Optional text query: “pizza”, “burger”, “south indian”
Filters: open now, rating, price, cuisine
Radius: 1–10 km

Output

Ranked list of nearby restaurants
Sorted by relevance, distance, and business signals

Constraints

Millions of restaurants
Low latency (< 200ms P95)
High read QPS
Data changes, but not every second

2️⃣ Nature of the Data

Understanding data behavior drives architectural decisions.

Attribute	Behavior
Location	Mostly static
Name / Cuisine	Rarely changes
Ratings	Periodic updates
Open / Close status	Time-based
Query pattern	Read-heavy

➡️ This is not a real-time problem
➡️ This is a search and indexing problem

3️⃣ High-Level Architecture


User Request
     ↓
Search API
     ↓
Elasticsearch (Geo + Text)
     ↓
Source DB (Postgres / MySQL)

Optional Redis cache can be added for hot queries.

4️⃣ Why Elasticsearch?

Elasticsearch combines three critical capabilities:

Geo-spatial indexing
Full-text search
Relevance scoring and sorting

Doing all three efficiently in a relational database becomes painful at scale.

5️⃣ Location Modeling

Geo Point Mapping

{
  "location": {
    "type": "geo_point"
  }
}

This allows Elasticsearch to index latitude and longitude efficiently.

6️⃣ Geo Distance Query

{
  "query": {
    "bool": {
      "filter": {
        "geo_distance": {
          "distance": "5km",
          "location": {
            "lat": 12.9716,
            "lon": 77.5946
          }
        }
      }
    }
  }
}

This avoids full scans and only evaluates nearby spatial segments.

7️⃣ How Elasticsearch Computes “Nearby”

Elasticsearch performs spatial preprocessing using:

BKD Trees
Geohash-based partitioning

Restaurants are indexed into spatial cells. Queries only scan nearby cells.

Distance is calculated inflight only for filtered candidates.

8️⃣ Preprocessing vs Inflight Computation

Aspect	Preprocessing	Inflight
Spatial segmentation	✅	❌
Distance calculation	❌	✅
Text relevance	❌	✅

9️⃣ Full-Text + Geo Search

Example: “pizza near me”

{
  "query": {
    "bool": {
      "must": {
        "match": {
          "cuisine": "pizza"
        }
      },
      "filter": {
        "geo_distance": {
          "distance": "3km",
          "location": {
            "lat": 12.97,
            "lon": 77.59
          }
        }
      }
    }
  }
}

🔟 Ranking Strategy

Distance alone is insufficient.

Typical conceptual scoring:


Final Score =
  Text Relevance
+ Rating Weight
+ Popularity Score
- Distance Penalty

Scripted Sorting Example

{
  "_script": {
    "type": "number",
    "script": {
      "source": "doc['rating'].value * 2 - doc['distance'].value"
    },
    "order": "desc"
  }
}

1️⃣1️⃣ Radius Search Tradeoffs

Small radius → faster, fewer results
Large radius → slower, noisier results
Dynamic radius → best UX

Common approach: start small, expand if results are insufficient.

1️⃣2️⃣ Caching with Redis

Caching is optional but useful for hot locations.

Example cache key:


city:blr:lat:12.97:lon:77.59:radius:3km

TTL should be short (5–15 minutes).

1️⃣3️⃣ Why Not Redis GEO for Discovery?

Redis GEO	Elasticsearch
Fast	Slightly slower
No full-text	Full-text search
No ranking	Advanced ranking
Memory-heavy	Disk-backed

Redis GEO is better suited for real-time driver matching.

1️⃣4️⃣ Postgres + PostGIS?

It works, but with limitations.

Good for early-stage systems
Hard to scale ranking logic
Search complexity grows quickly

At scale, Elasticsearch wins.

1️⃣5️⃣ Data Freshness

Ratings → async index updates
Menu changes → delayed propagation
Location changes → extremely rare

Near real-time consistency is acceptable for discovery.

1️⃣6️⃣ Key Tradeoffs

Elasticsearch complexity vs scalability
Preprocessing vs storage cost
Composite ranking vs explainability
Caching vs freshness

Closing Thoughts

Search & Discovery is fundamentally:

An indexing problem
A ranking problem
A read-scalability problem

DEV Community