In many on-demand logistics systems, the core challenge is: “How do you turn a raw address string into the correct driver-zone code—in under 100 ms?” In this case study, we’ll walk through the high-level patterns and engineering decisions behind a real-world dispatch pipeline that handles thousands of bookings per minute. We’ll cover:
- Elasticsearch indexing and full-text search
- Redis caching for ultra-fast lookups
- AI-driven tokenization and custom re-scoring
- Aggregations for monitoring and analytics
- Data-pipelining best practices
Why 100 ms Lookups Matter
SLA Compliance: Every millisecond adds up when you’re handling thousands of bookings per minute.
- Driver & Rider Experience: Instant assignment prevents wait-time spikes and driver frustration.
- Cost Efficiency: Faster lookups reduce compute costs and let you scale horizontally with fewer nodes.
- Imagine a queue of 10000 booking requests. A 150 ms lookup means 25 nodes processing in parallel to handle peak load; a 65 ms lookup cuts that in half. Those savings compound in the cloud.
High Level Architecture:
[ Booking API ]
↓
[ Redis Cache ]
↓
[ Elasticsearch ]
↓
[ AI Segmentation ]
↓
[ Formula Re-scoring ]
↓
[ Zone-Code Mapper ]
↓
[ Response + Persistence ]
- Booking API receives a booking with a raw address string.
- Redis Cache checks for recent results—if hit, return immediately.
- Elasticsearch performs fuzzy/full-text + geo queries against our regions index.
- AI Segmentation tokenizes the address into street, landmark, and unit components.
- Formula Re-scoring blends text match score and geographic distance into a final ranking.
- Zone-Code Mapper looks up the driver-zone code from the top region candidate.
- Response + Persistence returns the code to the caller and logs the lookup for analytics.
Address → Region with Elasticsearch
Index Design
We maintain a regions index with three core fields:
- address_text (type: text)
- coordinates (type: geo_point)
- region_id (type: keyword)
To support prefix matching and fuzzy queries, we use an edge-ngram analyzer:
PUT /regions
{
"settings": {
"analysis": {
"analyzer": {
"edge_ngram_analyzer": {
"tokenizer": "edge_ngram_tokenizer",
"filter": ["lowercase"]
}
},
"tokenizer": {
"edge_ngram_tokenizer": {
"type": "edge_ngram",
"min_gram": 2,
"max_gram": 20,
"token_chars": ["letter", "digit"]
}
}
}
},
"mappings": {
"properties": {
"address_text": {
"type": "text",
"analyzer": "edge_ngram_analyzer"
},
"coordinates": { "type": "geo_point" },
"region_id": { "type": "keyword" }
}
}
}
Sample Query:
POST /regions/_search
{
"size": 5,
"query": {
"bool": {
"must": [
{
"match": {
"address_text": {
"query": "1600 Amphitheatre Pkwy",
"fuzziness": "AUTO"
}
}
}
],
"filter": [
{
"geo_distance": {
"distance": "5km",
"coordinates": { "lat": 37.42, "lon": -122.08 }
}
}
]
}
}
}
Ultra-Fast Redis Caching
Why Cache?
- Reduces ES load on repeat lookups.
- Delivers sub-millisecond responses for popular addresses.
Key Design
- Key: cache:region:
- Value: JSON { "region_id": "...", "timestamp": 123456789 }
- TTL: 12 hours (adjust for region-definition update frequency)
// Node.js pseudocode
const key = `cache:region:${normalize(address)}`;
const cached = await redis.get(key);
if (cached) return JSON.parse(cached);
const result = await findRegionInES(address);
await redis.setex(key, 43200, JSON.stringify(result));
return result;
AI-Driven Tokenization
Addresses come in countless formats. We found that a lightweight NLP model helps extract consistent tokens:
- Street names
- Landmarks or points of interest
- Unit numbers or building suffixes
Notes: the model is confidential
Formula-Based Re-Scoring
After retrieving the top N candidates from Elasticsearch, we re-score them to balance text-match quality and proximity:
function scoreCandidate(esScore, distanceMeters) {
const α = 0.7; // text weight
const β = 0.3; // proximity weight
return α * esScore + β * (1 / (distanceMeters + 1));
}
// Choose candidate with highest final score
α = 0.7 emphasizes fuzzy/full-text match.
β = 0.3 rewards geographic closeness.
Mapping to Driver Zone Codes
With your best region_id in hand:
- Lookup in a simple table (region_id → zone_code).
- Apply specificity rules: when regions overlap, use “most specific” first.
- Fallbacks: ambiguous or no-match addresses get routed to a broad city-wide zone or manual review queue.
SELECT zone_code
FROM region_zone_map
WHERE region_id = :bestRegionId
ORDER BY specificity DESC
LIMIT 1;
Aggregation & Monitoring
Tracking how lookups distribute across zones helps detect demand spikes and system regressions.
GET /bookings/_search
{
"size": 0,
"aggs": {
"by_zone": {
"terms": { "field": "zone_code", "size": 20 }
}
}
}
Dashboard: Plot “bookings per zone” in Kibana or Grafana.
Alerts: Notify if a zone’s booking rate doubles in 5 minutes.
Data Pipelining at Scale
Real-Time Stream
- Kafka or SQS streams each booking event.
- A stateless worker (AWS Lambda or K8s pod) runs the lookup pipeline end-to-end.
Batch Jobs
- Nightly Re-index: Ingest new or updated region definitions into Elasticsearch.
- Model Retraining: Periodically retrain your NER model on fresh, labeled address data.
Performance Results
Metric | Before | After |
---|---|---|
p99 Lookup Latency | 150 ms | 65 ms |
Elasticsearch QPS | 2 000 | 800 |
Redis Cache Hit Rate | — | 85 % |
Manual Review Fallback Rate | 5 % | 1.2 % |
- Bulk indexing and query caching further reduced ES load.
- Horizontal scaling of stateless workers allowed seamless throughput growth during peak hours.
Lessons Learned & Next Steps
- Prototype Quickly: Start with ES + caching before adding NLP complexity.
- Measure Early: Instrument each component to pinpoint bottlenecks.
- Iterate Weights: α/β may need retuning as address distributions shift.
Future Improvements:
- Dynamic Zone Editing UI for ops teams.
- Real-Time ML Feedback Loop using mis-assignment data.
- Geo-Fencing Enhancements for irregularly shaped zones.
Conclusion & Call to Action
We’ve shown how to convert raw address strings into sub-100 ms driver-zone assignments using a layered approach of Elasticsearch, Redis, NLP, and custom scoring.
TL;DR:
- Edge-ngram ES + fuzzy queries for flexible text matching.
- Redis caching for repeat lookups.
- Lightweight NLP to normalize address tokens.
- Weighted formulas to balance match quality and proximity.
- Streaming + batch pipelines for real-time scale.
Top comments (0)