In many on-demand logistics systems, the core challenge is: “How do you turn a raw address string into the correct driver-zone code—in under 100 ms?” In this case study, we’ll walk through the high-level patterns and engineering decisions behind a real-world dispatch pipeline that handles thousands of bookings per minute. We’ll cover:
- Elasticsearch indexing and full-text search
- Redis caching for ultra-fast lookups
- AI-driven tokenization and custom re-scoring
- Aggregations for monitoring and analytics
- Data-pipelining best practices
Why 100 ms Lookups Matter
SLA Compliance: Every millisecond adds up when you’re handling thousands of bookings per minute.
- Driver & Rider Experience: Instant assignment prevents wait-time spikes and driver frustration.
- Cost Efficiency: Faster lookups reduce compute costs and let you scale horizontally with fewer nodes.
- Imagine a queue of 10000 booking requests. A 150 ms lookup means 25 nodes processing in parallel to handle peak load; a 65 ms lookup cuts that in half. Those savings compound in the cloud.
High Level Architecture:
[ Booking API ]
       ↓
[ Redis Cache ]
       ↓
[ Elasticsearch ]
       ↓
[ AI Segmentation ]
       ↓
[ Formula Re-scoring ]
       ↓
[ Zone-Code Mapper ]
       ↓
[ Response + Persistence ]
- Booking API receives a booking with a raw address string.
- Redis Cache checks for recent results—if hit, return immediately.
- Elasticsearch performs fuzzy/full-text + geo queries against our regions index.
- AI Segmentation tokenizes the address into street, landmark, and unit components.
- Formula Re-scoring blends text match score and geographic distance into a final ranking.
- Zone-Code Mapper looks up the driver-zone code from the top region candidate.
- Response + Persistence returns the code to the caller and logs the lookup for analytics.
Address → Region with Elasticsearch
Index Design
We maintain a regions index with three core fields:
- address_text (type: text)
- coordinates (type: geo_point)
- region_id (type: keyword)
To support prefix matching and fuzzy queries, we use an edge-ngram analyzer:
PUT /regions
{
  "settings": {
    "analysis": {
      "analyzer": {
        "edge_ngram_analyzer": {
          "tokenizer": "edge_ngram_tokenizer",
          "filter": ["lowercase"]
        }
      },
      "tokenizer": {
        "edge_ngram_tokenizer": {
          "type": "edge_ngram",
          "min_gram": 2,
          "max_gram": 20,
          "token_chars": ["letter", "digit"]
        }
      }
    }
  },
  "mappings": {
    "properties": {
      "address_text": {
        "type": "text",
        "analyzer": "edge_ngram_analyzer"
      },
      "coordinates": { "type": "geo_point" },
      "region_id":    { "type": "keyword" }
    }
  }
}
Sample Query:
POST /regions/_search
{
  "size": 5,
  "query": {
    "bool": {
      "must": [
        {
          "match": {
            "address_text": {
              "query": "1600 Amphitheatre Pkwy",
              "fuzziness": "AUTO"
            }
          }
        }
      ],
      "filter": [
        {
          "geo_distance": {
            "distance": "5km",
            "coordinates": { "lat": 37.42, "lon": -122.08 }
          }
        }
      ]
    }
  }
}
Ultra-Fast Redis Caching
Why Cache?
- Reduces ES load on repeat lookups.
- Delivers sub-millisecond responses for popular addresses.
Key Design
- Key: cache:region:
- Value: JSON { "region_id": "...", "timestamp": 123456789 }
- TTL: 12 hours (adjust for region-definition update frequency)
// Node.js pseudocode
const key = `cache:region:${normalize(address)}`;
const cached = await redis.get(key);
if (cached) return JSON.parse(cached);
const result = await findRegionInES(address);
await redis.setex(key, 43200, JSON.stringify(result));
return result;
AI-Driven Tokenization
Addresses come in countless formats. We found that a lightweight NLP model helps extract consistent tokens:
- Street names
- Landmarks or points of interest
- Unit numbers or building suffixes
Notes: the model is confidential
Formula-Based Re-Scoring
After retrieving the top N candidates from Elasticsearch, we re-score them to balance text-match quality and proximity:
function scoreCandidate(esScore, distanceMeters) {
  const α = 0.7;  // text weight
  const β = 0.3;  // proximity weight
  return α * esScore + β * (1 / (distanceMeters + 1));
}
// Choose candidate with highest final score
α = 0.7 emphasizes fuzzy/full-text match.
β = 0.3 rewards geographic closeness.
Mapping to Driver Zone Codes
With your best region_id in hand:
- Lookup in a simple table (region_id → zone_code).
- Apply specificity rules: when regions overlap, use “most specific” first.
- Fallbacks: ambiguous or no-match addresses get routed to a broad city-wide zone or manual review queue.
SELECT zone_code
FROM region_zone_map
WHERE region_id = :bestRegionId
ORDER BY specificity DESC
LIMIT 1;
Aggregation & Monitoring
Tracking how lookups distribute across zones helps detect demand spikes and system regressions.
GET /bookings/_search
{
  "size": 0,
  "aggs": {
    "by_zone": {
      "terms": { "field": "zone_code", "size": 20 }
    }
  }
}
Dashboard: Plot “bookings per zone” in Kibana or Grafana.
Alerts: Notify if a zone’s booking rate doubles in 5 minutes.
Data Pipelining at Scale
Real-Time Stream
- Kafka or SQS streams each booking event.
- A stateless worker (AWS Lambda or K8s pod) runs the lookup pipeline end-to-end.
Batch Jobs
- Nightly Re-index: Ingest new or updated region definitions into Elasticsearch.
- Model Retraining: Periodically retrain your NER model on fresh, labeled address data.
Performance Results
| Metric | Before | After | 
|---|---|---|
| p99 Lookup Latency | 150 ms | 65 ms | 
| Elasticsearch QPS | 2 000 | 800 | 
| Redis Cache Hit Rate | — | 85 % | 
| Manual Review Fallback Rate | 5 % | 1.2 % | 
- Bulk indexing and query caching further reduced ES load.
- Horizontal scaling of stateless workers allowed seamless throughput growth during peak hours.
Lessons Learned & Next Steps
- Prototype Quickly: Start with ES + caching before adding NLP complexity.
- Measure Early: Instrument each component to pinpoint bottlenecks.
- Iterate Weights: α/β may need retuning as address distributions shift.
- Future Improvements: 
- Dynamic Zone Editing UI for ops teams.
 
- Real-Time ML Feedback Loop using mis-assignment data.
 
- Geo-Fencing Enhancements for irregularly shaped zones.
 
Conclusion & Call to Action
We’ve shown how to convert raw address strings into sub-100 ms driver-zone assignments using a layered approach of Elasticsearch, Redis, NLP, and custom scoring.
TL;DR:
- Edge-ngram ES + fuzzy queries for flexible text matching.
- Redis caching for repeat lookups.
- Lightweight NLP to normalize address tokens.
- Weighted formulas to balance match quality and proximity.
- Streaming + batch pipelines for real-time scale.
 

 
    
Top comments (0)