Harisalghifary

Posted on Jul 7 • Edited on Jul 10

How to Build a Low-Latency Driver-Assignment Service

#elasticsearch #backend #ai #eventdriven

In many on-demand logistics systems, the core challenge is: “How do you turn a raw address string into the correct driver-zone code—in under 100 ms?” In this case study, we’ll walk through the high-level patterns and engineering decisions behind a real-world dispatch pipeline that handles thousands of bookings per minute. We’ll cover:

Elasticsearch indexing and full-text search
Redis caching for ultra-fast lookups
AI-driven tokenization and custom re-scoring
Aggregations for monitoring and analytics
Data-pipelining best practices

Why 100 ms Lookups Matter

SLA Compliance: Every millisecond adds up when you’re handling thousands of bookings per minute.

Driver & Rider Experience: Instant assignment prevents wait-time spikes and driver frustration.
Cost Efficiency: Faster lookups reduce compute costs and let you scale horizontally with fewer nodes.
Imagine a queue of 10000 booking requests. A 150 ms lookup means 25 nodes processing in parallel to handle peak load; a 65 ms lookup cuts that in half. Those savings compound in the cloud.

High Level Architecture:

[ Booking API ]
       ↓
[ Redis Cache ]
       ↓
[ Elasticsearch ]
       ↓
[ AI Segmentation ]
       ↓
[ Formula Re-scoring ]
       ↓
[ Zone-Code Mapper ]
       ↓
[ Response + Persistence ]

Booking API receives a booking with a raw address string.
Redis Cache checks for recent results—if hit, return immediately.
Elasticsearch performs fuzzy/full-text + geo queries against our regions index.
AI Segmentation tokenizes the address into street, landmark, and unit components.
Formula Re-scoring blends text match score and geographic distance into a final ranking.
Zone-Code Mapper looks up the driver-zone code from the top region candidate.
Response + Persistence returns the code to the caller and logs the lookup for analytics.

Address → Region with Elasticsearch

Index Design

We maintain a regions index with three core fields:

address_text (type: text)
coordinates (type: geo_point)
region_id (type: keyword)

To support prefix matching and fuzzy queries, we use an edge-ngram analyzer:

PUT /regions
{
  "settings": {
    "analysis": {
      "analyzer": {
        "edge_ngram_analyzer": {
          "tokenizer": "edge_ngram_tokenizer",
          "filter": ["lowercase"]
        }
      },
      "tokenizer": {
        "edge_ngram_tokenizer": {
          "type": "edge_ngram",
          "min_gram": 2,
          "max_gram": 20,
          "token_chars": ["letter", "digit"]
        }
      }
    }
  },
  "mappings": {
    "properties": {
      "address_text": {
        "type": "text",
        "analyzer": "edge_ngram_analyzer"
      },
      "coordinates": { "type": "geo_point" },
      "region_id":    { "type": "keyword" }
    }
  }
}

Sample Query:

POST /regions/_search
{
  "size": 5,
  "query": {
    "bool": {
      "must": [
        {
          "match": {
            "address_text": {
              "query": "1600 Amphitheatre Pkwy",
              "fuzziness": "AUTO"
            }
          }
        }
      ],
      "filter": [
        {
          "geo_distance": {
            "distance": "5km",
            "coordinates": { "lat": 37.42, "lon": -122.08 }
          }
        }
      ]
    }
  }
}

Ultra-Fast Redis Caching

Why Cache?

Reduces ES load on repeat lookups.
Delivers sub-millisecond responses for popular addresses.

Key Design

Key: cache:region:
Value: JSON { "region_id": "...", "timestamp": 123456789 }
TTL: 12 hours (adjust for region-definition update frequency)

// Node.js pseudocode
const key = `cache:region:${normalize(address)}`;
const cached = await redis.get(key);
if (cached) return JSON.parse(cached);

const result = await findRegionInES(address);
await redis.setex(key, 43200, JSON.stringify(result));
return result;

AI-Driven Tokenization

Addresses come in countless formats. We found that a lightweight NLP model helps extract consistent tokens:

Street names
Landmarks or points of interest
Unit numbers or building suffixes

Notes: the model is confidential

Formula-Based Re-Scoring

After retrieving the top N candidates from Elasticsearch, we re-score them to balance text-match quality and proximity:

function scoreCandidate(esScore, distanceMeters) {
  const α = 0.7;  // text weight
  const β = 0.3;  // proximity weight
  return α * esScore + β * (1 / (distanceMeters + 1));
}

// Choose candidate with highest final score

α = 0.7 emphasizes fuzzy/full-text match.
β = 0.3 rewards geographic closeness.

Mapping to Driver Zone Codes

With your best region_id in hand:

Lookup in a simple table (region_id → zone_code).
Apply specificity rules: when regions overlap, use “most specific” first.
Fallbacks: ambiguous or no-match addresses get routed to a broad city-wide zone or manual review queue.

SELECT zone_code
FROM region_zone_map
WHERE region_id = :bestRegionId
ORDER BY specificity DESC
LIMIT 1;

Aggregation & Monitoring

Tracking how lookups distribute across zones helps detect demand spikes and system regressions.

GET /bookings/_search
{
  "size": 0,
  "aggs": {
    "by_zone": {
      "terms": { "field": "zone_code", "size": 20 }
    }
  }
}

Dashboard: Plot “bookings per zone” in Kibana or Grafana.
Alerts: Notify if a zone’s booking rate doubles in 5 minutes.

Data Pipelining at Scale

Real-Time Stream

Kafka or SQS streams each booking event.
A stateless worker (AWS Lambda or K8s pod) runs the lookup pipeline end-to-end.

Batch Jobs

Nightly Re-index: Ingest new or updated region definitions into Elasticsearch.
Model Retraining: Periodically retrain your NER model on fresh, labeled address data.

Performance Results

Metric	Before	After
p99 Lookup Latency	150 ms	65 ms
Elasticsearch QPS	2 000	800
Redis Cache Hit Rate	—	85 %
Manual Review Fallback Rate	5 %	1.2 %

Bulk indexing and query caching further reduced ES load.
Horizontal scaling of stateless workers allowed seamless throughput growth during peak hours.

Lessons Learned & Next Steps

Prototype Quickly: Start with ES + caching before adding NLP complexity.
Measure Early: Instrument each component to pinpoint bottlenecks.
Iterate Weights: α/β may need retuning as address distributions shift.
Future Improvements:
1. Dynamic Zone Editing UI for ops teams.
1. Real-Time ML Feedback Loop using mis-assignment data.
1. Geo-Fencing Enhancements for irregularly shaped zones.

Conclusion & Call to Action

We’ve shown how to convert raw address strings into sub-100 ms driver-zone assignments using a layered approach of Elasticsearch, Redis, NLP, and custom scoring.
TL;DR:

Edge-ngram ES + fuzzy queries for flexible text matching.
Redis caching for repeat lookups.
Lightweight NLP to normalize address tokens.
Weighted formulas to balance match quality and proximity.
Streaming + batch pipelines for real-time scale.

DEV Community