Search & Discovery Architecture for Location-Based Systems (Deep Dive)
Finding nearby restaurants looks simple. At scale, it’s a hard distributed systems problem involving spatial indexing, ranking, freshness, and tradeoffs between accuracy and performance.
This blog deep-dives into Search & Discovery for location-based systems like:
- Restaurant discovery (Zomato, Swiggy)
- Store search (Blinkit, Instamart)
- Nearby places (Google Maps-lite use cases)
1️⃣ Problem Definition
Inputs
- User location (latitude, longitude)
- Optional text query: “pizza”, “burger”, “south indian”
- Filters: open now, rating, price, cuisine
- Radius: 1–10 km
Output
- Ranked list of nearby restaurants
- Sorted by relevance, distance, and business signals
Constraints
- Millions of restaurants
- Low latency (< 200ms P95)
- High read QPS
- Data changes, but not every second
2️⃣ Nature of the Data
Understanding data behavior drives architectural decisions.
| Attribute | Behavior |
|---|---|
| Location | Mostly static |
| Name / Cuisine | Rarely changes |
| Ratings | Periodic updates |
| Open / Close status | Time-based |
| Query pattern | Read-heavy |
➡️ This is not a real-time problem
➡️ This is a search and indexing problem
3️⃣ High-Level Architecture
User Request
↓
Search API
↓
Elasticsearch (Geo + Text)
↓
Source DB (Postgres / MySQL)
Optional Redis cache can be added for hot queries.
4️⃣ Why Elasticsearch?
Elasticsearch combines three critical capabilities:
- Geo-spatial indexing
- Full-text search
- Relevance scoring and sorting
Doing all three efficiently in a relational database becomes painful at scale.
5️⃣ Location Modeling
Geo Point Mapping
{
"location": {
"type": "geo_point"
}
}
This allows Elasticsearch to index latitude and longitude efficiently.
6️⃣ Geo Distance Query
{
"query": {
"bool": {
"filter": {
"geo_distance": {
"distance": "5km",
"location": {
"lat": 12.9716,
"lon": 77.5946
}
}
}
}
}
}
This avoids full scans and only evaluates nearby spatial segments.
7️⃣ How Elasticsearch Computes “Nearby”
Elasticsearch performs spatial preprocessing using:
- BKD Trees
- Geohash-based partitioning
Restaurants are indexed into spatial cells. Queries only scan nearby cells.
Distance is calculated inflight only for filtered candidates.
8️⃣ Preprocessing vs Inflight Computation
| Aspect | Preprocessing | Inflight |
|---|---|---|
| Spatial segmentation | ✅ | ❌ |
| Distance calculation | ❌ | ✅ |
| Text relevance | ❌ | ✅ |
9️⃣ Full-Text + Geo Search
Example: “pizza near me”
{
"query": {
"bool": {
"must": {
"match": {
"cuisine": "pizza"
}
},
"filter": {
"geo_distance": {
"distance": "3km",
"location": {
"lat": 12.97,
"lon": 77.59
}
}
}
}
}
}
🔟 Ranking Strategy
Distance alone is insufficient.
Typical conceptual scoring:
Final Score =
Text Relevance
+ Rating Weight
+ Popularity Score
- Distance Penalty
Scripted Sorting Example
{
"_script": {
"type": "number",
"script": {
"source": "doc['rating'].value * 2 - doc['distance'].value"
},
"order": "desc"
}
}
1️⃣1️⃣ Radius Search Tradeoffs
- Small radius → faster, fewer results
- Large radius → slower, noisier results
- Dynamic radius → best UX
Common approach: start small, expand if results are insufficient.
1️⃣2️⃣ Caching with Redis
Caching is optional but useful for hot locations.
Example cache key:
city:blr:lat:12.97:lon:77.59:radius:3km
TTL should be short (5–15 minutes).
1️⃣3️⃣ Why Not Redis GEO for Discovery?
| Redis GEO | Elasticsearch |
|---|---|
| Fast | Slightly slower |
| No full-text | Full-text search |
| No ranking | Advanced ranking |
| Memory-heavy | Disk-backed |
Redis GEO is better suited for real-time driver matching.
1️⃣4️⃣ Postgres + PostGIS?
It works, but with limitations.
- Good for early-stage systems
- Hard to scale ranking logic
- Search complexity grows quickly
At scale, Elasticsearch wins.
1️⃣5️⃣ Data Freshness
- Ratings → async index updates
- Menu changes → delayed propagation
- Location changes → extremely rare
Near real-time consistency is acceptable for discovery.
1️⃣6️⃣ Key Tradeoffs
- Elasticsearch complexity vs scalability
- Preprocessing vs storage cost
- Composite ranking vs explainability
- Caching vs freshness
Closing Thoughts
Search & Discovery is fundamentally:
- An indexing problem
- A ranking problem
- A read-scalability problem
Top comments (0)