DEV Community

Harisalghifary
Harisalghifary

Posted on • Edited on

How to Build a Low-Latency Driver-Assignment Service

In many on-demand logistics systems, the core challenge is: “How do you turn a raw address string into the correct driver-zone code—in under 100 ms?” In this case study, we’ll walk through the high-level patterns and engineering decisions behind a real-world dispatch pipeline that handles thousands of bookings per minute. We’ll cover:

  • Elasticsearch indexing and full-text search
  • Redis caching for ultra-fast lookups
  • AI-driven tokenization and custom re-scoring
  • Aggregations for monitoring and analytics
  • Data-pipelining best practices

Why 100 ms Lookups Matter

SLA Compliance: Every millisecond adds up when you’re handling thousands of bookings per minute.

  • Driver & Rider Experience: Instant assignment prevents wait-time spikes and driver frustration.
  • Cost Efficiency: Faster lookups reduce compute costs and let you scale horizontally with fewer nodes.
  • Imagine a queue of 10000 booking requests. A 150 ms lookup means 25 nodes processing in parallel to handle peak load; a 65 ms lookup cuts that in half. Those savings compound in the cloud.

High Level Architecture:

[ Booking API ]
       ↓
[ Redis Cache ]
       ↓
[ Elasticsearch ]
       ↓
[ AI Segmentation ]
       ↓
[ Formula Re-scoring ]
       ↓
[ Zone-Code Mapper ]
       ↓
[ Response + Persistence ]
Enter fullscreen mode Exit fullscreen mode
  1. Booking API receives a booking with a raw address string.
  2. Redis Cache checks for recent results—if hit, return immediately.
  3. Elasticsearch performs fuzzy/full-text + geo queries against our regions index.
  4. AI Segmentation tokenizes the address into street, landmark, and unit components.
  5. Formula Re-scoring blends text match score and geographic distance into a final ranking.
  6. Zone-Code Mapper looks up the driver-zone code from the top region candidate.
  7. Response + Persistence returns the code to the caller and logs the lookup for analytics.

Address → Region with Elasticsearch

Index Design

We maintain a regions index with three core fields:

  • address_text (type: text)
  • coordinates (type: geo_point)
  • region_id (type: keyword)

To support prefix matching and fuzzy queries, we use an edge-ngram analyzer:

PUT /regions
{
  "settings": {
    "analysis": {
      "analyzer": {
        "edge_ngram_analyzer": {
          "tokenizer": "edge_ngram_tokenizer",
          "filter": ["lowercase"]
        }
      },
      "tokenizer": {
        "edge_ngram_tokenizer": {
          "type": "edge_ngram",
          "min_gram": 2,
          "max_gram": 20,
          "token_chars": ["letter", "digit"]
        }
      }
    }
  },
  "mappings": {
    "properties": {
      "address_text": {
        "type": "text",
        "analyzer": "edge_ngram_analyzer"
      },
      "coordinates": { "type": "geo_point" },
      "region_id":    { "type": "keyword" }
    }
  }
}

Enter fullscreen mode Exit fullscreen mode

Sample Query:

POST /regions/_search
{
  "size": 5,
  "query": {
    "bool": {
      "must": [
        {
          "match": {
            "address_text": {
              "query": "1600 Amphitheatre Pkwy",
              "fuzziness": "AUTO"
            }
          }
        }
      ],
      "filter": [
        {
          "geo_distance": {
            "distance": "5km",
            "coordinates": { "lat": 37.42, "lon": -122.08 }
          }
        }
      ]
    }
  }
}

Enter fullscreen mode Exit fullscreen mode

Ultra-Fast Redis Caching

Why Cache?

  • Reduces ES load on repeat lookups.
  • Delivers sub-millisecond responses for popular addresses.

Key Design

  • Key: cache:region:
  • Value: JSON { "region_id": "...", "timestamp": 123456789 }
  • TTL: 12 hours (adjust for region-definition update frequency)
// Node.js pseudocode
const key = `cache:region:${normalize(address)}`;
const cached = await redis.get(key);
if (cached) return JSON.parse(cached);

const result = await findRegionInES(address);
await redis.setex(key, 43200, JSON.stringify(result));
return result;

Enter fullscreen mode Exit fullscreen mode

AI-Driven Tokenization

Addresses come in countless formats. We found that a lightweight NLP model helps extract consistent tokens:

  • Street names
  • Landmarks or points of interest
  • Unit numbers or building suffixes

Notes: the model is confidential

Formula-Based Re-Scoring

After retrieving the top N candidates from Elasticsearch, we re-score them to balance text-match quality and proximity:

function scoreCandidate(esScore, distanceMeters) {
  const α = 0.7;  // text weight
  const β = 0.3;  // proximity weight
  return α * esScore + β * (1 / (distanceMeters + 1));
}

// Choose candidate with highest final score

Enter fullscreen mode Exit fullscreen mode

α = 0.7 emphasizes fuzzy/full-text match.
β = 0.3 rewards geographic closeness.

Mapping to Driver Zone Codes

With your best region_id in hand:

  1. Lookup in a simple table (region_id → zone_code).
  2. Apply specificity rules: when regions overlap, use “most specific” first.
  3. Fallbacks: ambiguous or no-match addresses get routed to a broad city-wide zone or manual review queue.
SELECT zone_code
FROM region_zone_map
WHERE region_id = :bestRegionId
ORDER BY specificity DESC
LIMIT 1;

Enter fullscreen mode Exit fullscreen mode

Aggregation & Monitoring

Tracking how lookups distribute across zones helps detect demand spikes and system regressions.

GET /bookings/_search
{
  "size": 0,
  "aggs": {
    "by_zone": {
      "terms": { "field": "zone_code", "size": 20 }
    }
  }
}

Enter fullscreen mode Exit fullscreen mode

Dashboard: Plot “bookings per zone” in Kibana or Grafana.
Alerts: Notify if a zone’s booking rate doubles in 5 minutes.

Data Pipelining at Scale

Real-Time Stream

  • Kafka or SQS streams each booking event.
  • A stateless worker (AWS Lambda or K8s pod) runs the lookup pipeline end-to-end.

Batch Jobs

  • Nightly Re-index: Ingest new or updated region definitions into Elasticsearch.
  • Model Retraining: Periodically retrain your NER model on fresh, labeled address data.

Performance Results

Metric Before After
p99 Lookup Latency 150 ms 65 ms
Elasticsearch QPS 2 000 800
Redis Cache Hit Rate 85 %
Manual Review Fallback Rate 5 % 1.2 %
  • Bulk indexing and query caching further reduced ES load.
  • Horizontal scaling of stateless workers allowed seamless throughput growth during peak hours.

Lessons Learned & Next Steps

  • Prototype Quickly: Start with ES + caching before adding NLP complexity.
  • Measure Early: Instrument each component to pinpoint bottlenecks.
  • Iterate Weights: α/β may need retuning as address distributions shift.
  • Future Improvements:

    1. Dynamic Zone Editing UI for ops teams.
    1. Real-Time ML Feedback Loop using mis-assignment data.
    1. Geo-Fencing Enhancements for irregularly shaped zones.

Conclusion & Call to Action

We’ve shown how to convert raw address strings into sub-100 ms driver-zone assignments using a layered approach of Elasticsearch, Redis, NLP, and custom scoring.
TL;DR:

  • Edge-ngram ES + fuzzy queries for flexible text matching.
  • Redis caching for repeat lookups.
  • Lightweight NLP to normalize address tokens.
  • Weighted formulas to balance match quality and proximity.
  • Streaming + batch pipelines for real-time scale.

Top comments (0)