Daniel Romitelli

Posted on Mar 10 • Edited on Mar 23 • Originally published at craftedbydaniel.com

Cache-First Geocoding with Azure Maps: Key Topology, TTL Heuristics, and Quota Smoothing

#python #fastapi #azuremaps #caching

I originally got asked to write about app/azure_integrations/azure_maps.py—specifically the cache key scheme (tile vs geohash), TTL heuristics by zoom level, and the fallthrough behavior to the upstream geocoder with backoff.

That’s exactly the post I wanted to write, because that file is real and non-trivial in this codebase: it defines an Azure Maps client (async, httpx-backed), exposes geocoding methods (including geocode_address referenced from our LangGraph manager), and it’s wired into the enrichment graph as the “last mile” location normalizer.

The part that matters isn’t “we call Azure Maps.” The part that matters is: we learned—painfully—that the fastest, cheapest API call is the one you never make. So the integration is cache-first by design, with keys that dedupe semantically equivalent queries and TTLs that match how “stable” a location answer is.

Below is the design, the incident that forced it, and the concrete implementation patterns that keep Azure Maps reliable under burst load.

What went wrong first (the real trigger)

The first failure mode wasn’t an exception trace. It was a behavioral bug that quietly multiplied upstream calls.

In app/langgraph_manager.py we literally left ourselves a trail of corrective surgery in comments and log messages:

“Fix 2: Address Detection Heuristic - route to correct endpoint”
“Fix 7: Short-circuit when both fields set (skip Azure Maps if Firecrawl already set both)”

Those aren’t stylistic refactors. They’re scar tissue.

Here’s the scenario that caused the bleed:

We’d run Firecrawl research for an entity.
Firecrawl would often return city/state already (good enough for our product surface).
The graph would still call Azure Maps anyway—sometimes multiple times—because downstream steps treated geocoding as a mandatory enrichment, not a conditional completion step.
Worse: we were sometimes routing an address-shaped query ("123 Main St, Seattle") into a POI search endpoint (which tends to be broader and less deterministic), get no results, then fall back to address geocoding—two calls where one would do.

That’s why the LangGraph manager now has the explicit short-circuit:

If both city and state are already present, we skip Azure Maps.

And it’s why we added the address detection heuristic:

If the query looks like an address, go straight to geocode_address.

Those two fixes were the moment the geocoding integration stopped being “a convenience” and became a budgeted service with strong opinions about when it is allowed to run.

Once you accept that premise, cache-first isn’t an optimization; it’s table stakes.

The integration boundary: how Azure Maps actually fits the pipeline

Azure Maps isn’t the source of truth for location in this platform.

The platform is an enrichment system. It pulls signals from messy sources—XLSX-imported advisor records, [REDACTED] CRM fields, Firecrawl research—and then normalizes into something we can query, sort, and reason about.

So the location pipeline is intentionally layered:

Prefer already-known structured fields (DB/CRM).
Prefer research-derived location (Firecrawl returns city/state/address frequently).
Use Azure Maps only to fill gaps or normalize ambiguous strings.

That is consistent with the logging we left in langgraph_manager.py (short-circuit if Firecrawl already completed the fields) and consistent with the broader system discipline you can see in the advisor enrichment worker:

cost is returned (credits_used, search_duration_seconds)
the workflow executor carries total_credits as a loop invariant
feature flags exist as kill switches

Geocoding follows the same philosophy: it’s a bounded dependency, not a magical oracle.

Cache-first geocoding: the three decisions that matter

When people say “cache geocoding,” they usually mean “store the JSON response keyed by the query string.” That’s not enough.

In practice, these are the decisions that determine whether your cache saves you money or just stores junk:

Key topology: what is the canonical representation of the request?
TTL heuristics: how long is the answer valid for this type of query?
Quota smoothing / backoff: what happens under burst, partial outage, or 429?

I’ll go through each, then show a working reference implementation that mirrors what we ship: async client, cache wrapper, TTL policy, and backoff.

1) Cache key topology: query strings are lies

A location lookup’s “meaning” is not the raw string. Users (and upstream systems) produce semantically identical queries with wildly different spelling and formatting:

"New York, NY" vs "New York NY" vs "new york, new york"
"St. Louis" vs "Saint Louis"
"Seattle WA" vs "Seattle, Washington"

If your cache key is the raw query, you miss most hits.

My rule: keys must be stable under harmless variation

For our integration, I treat a geocode request as a tuple of:

operation: geocode_address, geocode_place, reverse_geocode, poi_search
normalized query: case-folded, whitespace collapsed, punctuation simplified
country filter (and any other parameters that materially change results)
resolution: a “zoom class” or precision tier that drives TTL

Tile vs geohash (and why I don’t pick only one)

This is the part people argue about.

Tile-based keys (Web Mercator tiles like z/x/y) are great for map rendering workloads and reverse geocoding around a viewport. They align with “what the user sees.”
Geohash keys are great for deduping point-like lookups and clustering nearby requests across different zoom levels.

In our enrichment workload, we do both kinds of lookups:

Address / place queries: string → coordinates + components
Reverse lookups: lat/lon → locality + admin regions

So the key topology is mixed:

For string-based geocoding, the key is derived from the normalized string (plus country filter, plus operation).
For reverse geocoding, the key is derived from a geohash (or equivalently a rounded lat/lon bucket) so nearby points reuse results.

The reason to bucket reverse geocoding is simple: downstream tasks don’t need “the city boundary accurate to 1 meter.” They need a stable city/state answer.

2) TTL heuristics: store stable answers longer than volatile ones

TTL is not “one number.” TTL is a policy.

Location answers change at different rates:

A city/state for a point is stable for months.
A POI search query can change quickly (businesses open/close, rankings shift).
An address geocode can change slowly (new construction, renumbering) but is usually stable.

So we use TTL buckets:

Reverse geocode locality: long TTL
Address geocode: medium TTL
POI search: shorter TTL

We also condition TTL on “resolution”: the more coarse the query, the longer we can cache because the answer is inherently less sensitive.

The key idea is: TTL encodes how expensive it is to be wrong.

If a reverse geocode returns the wrong city once a year, no one cares. If POI search returns a stale ranking for “pizza near me,” people notice.

3) Quota smoothing + backoff: the upstream will say no

Even with caching, you can spike:

a batch enrichment run
a user action that fans out to many lookups
a retry storm during transient network failure

If you treat Azure Maps as “just another HTTP call,” you’ll eventually create your own 429 incident.

So we shape traffic:

Client-side concurrency limits (don’t fire 500 calls at once)
Backoff on 429/503 (respect Retry-After when present)
Negative caching for empty results (short TTL so you don’t re-query nonsense every time)

This is the same mindset as the Firecrawl side:

we pass max_credits into the Firecrawl agent
we accumulate credits_used

For Azure Maps, the equivalent is “max requests per second” plus cache.

A complete, runnable reference implementation

The real app/azure_integrations/azure_maps.py in this project is integrated into our settings, auth, and cache stack. For the blog, I’m providing a minimal but fully runnable version that demonstrates the exact patterns we use:

async httpx client
cache-first wrapper with an interface you can back by Redis/memory
stable key generation
TTL heuristics
backoff with Retry-After

You can paste this into a file and run it. (It uses a fake upstream call so it doesn’t require Azure credentials.)

import asyncio
import json
import re
import time
from dataclasses import dataclass
from typing import Any, Optional, Dict, Tuple

import httpx


class AsyncCache:
    """Minimal async cache interface."""

    async def get(self, key: str) -> Optional[str]:
        raise NotImplementedError

    async def set(self, key: str, value: str, ttl_seconds: int) -> None:
        raise NotImplementedError


class InMemoryTTLCache(AsyncCache):
    def __init__(self) -> None:
        self._store: Dict[str, Tuple[float, str]] = {}

    async def get(self, key: str) -> Optional[str]:
        now = time.time()
        item = self._store.get(key)
        if not item:
            return None
        expires_at, value = item
        if expires_at < now:
            self._store.pop(key, None)
            return None
        return value

    async def set(self, key: str, value: str, ttl_seconds: int) -> None:
        self._store[key] = (time.time() + ttl_seconds, value)


def _normalize_query(q: str) -> str:
    q = q.strip().lower()
    # Collapse whitespace
    q = re.sub(r"\s+", " ", q)
    # Normalize common punctuation
    q = q.replace(",", " ")
    q = re.sub(r"\s+", " ", q).strip()
    return q


def _is_likely_address(q: str) -> bool:
    """Cheap heuristic: number + street-ish token."""
    qn = _normalize_query(q)
    return bool(re.search(r"\b\d{1,6}\b", qn)) and bool(re.search(r"\b(st|street|ave|avenue|rd|road|blvd|lane|ln|dr|drive)\b", qn))


def _geohash_bucket(lat: float, lon: float, precision_digits: int = 2) -> str:
    """Not a real geohash; a bucketed coordinate key (good enough for cache keys)."""
    return f"{round(lat, precision_digits)}:{round(lon, precision_digits)}"


@dataclass
class TTLPolicy:
    reverse_geocode_seconds: int = 60 * 60 * 24 * 30       # 30 days
    address_geocode_seconds: int = 60 * 60 * 24 * 7        # 7 days
    poi_search_seconds: int = 60 * 60 * 12                 # 12 hours
    negative_cache_seconds: int = 60 * 10                  # 10 minutes


class AzureMapsClient:
    def __init__(
        self,
        *,
        subscription_key: str,
        cache: AsyncCache,
        ttl: TTLPolicy = TTLPolicy(),
        timeout_seconds: float = 10.0,
        max_retries: int = 3,
    ) -> None:
        self.subscription_key = subscription_key
        self.cache = cache
        self.ttl = ttl
        self.max_retries = max_retries
        self._client = httpx.AsyncClient(timeout=timeout_seconds)

    async def aclose(self) -> None:
        await self._client.aclose()

    def _cache_key_geocode(self, op: str, query: str, country_filter: Optional[str]) -> str:
        qn = _normalize_query(query)
        cf = (country_filter or "").upper()
        return f"azure_maps:{op}:q={qn}:country={cf}"

    def _cache_key_reverse(self, lat: float, lon: float) -> str:
        bucket = _geohash_bucket(lat, lon, precision_digits=2)
        return f"azure_maps:reverse:bucket={bucket}"

    async def _cached_json(self, key: str) -> Optional[dict]:
        raw = await self.cache.get(key)
        return json.loads(raw) if raw else None

    async def _store_json(self, key: str, obj: dict, ttl_seconds: int) -> None:
        await self.cache.set(key, json.dumps(obj, separators=(",", ":")), ttl_seconds)

    async def _request_with_backoff(self, method: str, url: str, params: dict) -> dict:
        # This method is written for Azure Maps style endpoints but uses a fake upstream
        # to keep the snippet runnable.
        for attempt in range(self.max_retries + 1):
            try:
                # Simulated upstream behavior: no network call.
                # Replace with:
                #   resp = await self._client.request(method, url, params=params)
                #   resp.raise_for_status(); return resp.json()
                await asyncio.sleep(0.02)
                return {"ok": True, "url": url, "params": params}
            except httpx.HTTPStatusError as e:
                status = e.response.status_code
                if status in (429, 503) and attempt < self.max_retries:
                    retry_after = e.response.headers.get("Retry-After")
                    delay = float(retry_after) if retry_after else (0.5 * (2 ** attempt))
                    await asyncio.sleep(delay)
                    continue
                raise

        raise RuntimeError("unreachable")

    async def geocode_address(self, query: str, *, country_filter: Optional[str] = None) -> dict:
        key = self._cache_key_geocode("geocode_address", query, country_filter)
        cached = await self._cached_json(key)
        if cached:
            return cached

        payload = await self._request_with_backoff(
            "GET",
            url="https://atlas.microsoft.com/search/address/json",
            params={
                "subscription-key": self.subscription_key,
                "api-version": "1.0",
                "query": query,
                "countrySet": country_filter,
            },
        )

        # Negative caching: if upstream yields no useful content, keep a short TTL.
        ttl = self.ttl.address_geocode_seconds if payload.get("ok") else self.ttl.negative_cache_seconds
        await self._store_json(key, payload, ttl)
        return payload

    async def poi_search(self, query: str, *, country_filter: Optional[str] = None) -> dict:
        key = self._cache_key_geocode("poi_search", query, country_filter)
        cached = await self._cached_json(key)
        if cached:
            return cached

        payload = await self._request_with_backoff(
            "GET",
            url="https://atlas.microsoft.com/search/poi/json",
            params={
                "subscription-key": self.subscription_key,
                "api-version": "1.0",
                "query": query,
                "countrySet": country_filter,
            },
        )
        ttl = self.ttl.poi_search_seconds if payload.get("ok") else self.ttl.negative_cache_seconds
        await self._store_json(key, payload, ttl)
        return payload

    async def reverse_geocode(self, lat: float, lon: float) -> dict:
        key = self._cache_key_reverse(lat, lon)
        cached = await self._cached_json(key)
        if cached:
            return cached

        payload = await self._request_with_backoff(
            "GET",
            url="https://atlas.microsoft.com/search/address/reverse/json",
            params={
                "subscription-key": self.subscription_key,
                "api-version": "1.0",
                "query": f"{lat},{lon}",
            },
        )
        ttl = self.ttl.reverse_geocode_seconds if payload.get("ok") else self.ttl.negative_cache_seconds
        await self._store_json(key, payload, ttl)
        return payload


async def demo() -> None:
    cache = InMemoryTTLCache()
    client = AzureMapsClient(subscription_key="REDACTED", cache=cache)

    q = "123 Main St, Seattle, WA"
    if _is_likely_address(q):
        a = await client.geocode_address(q, country_filter="US")
        b = await client.geocode_address(" 123  Main St Seattle WA ", country_filter="US")
        assert a == b  # cache hit due to normalization

    r1 = await client.reverse_geocode(47.6062, -122.3321)
    r2 = await client.reverse_geocode(47.60621, -122.33209)
    assert r1 == r2  # cache hit due to bucketing

    await client.aclose()
    print("ok")


if __name__ == "__main__":
    asyncio.run(demo())

That demo captures the core behavior we rely on in production:

address detection routes to the correct method
normalization dedupes string variants
reverse geocode buckets nearby points
TTL varies by operation
retries don’t create a thundering herd

The “POI then address” fallthrough (and why it exists)

The LangGraph manager shows a practical fallback sequence:

try POI search
if it returns nothing, try address geocoding

That sounds redundant until you see the inputs we actually get from upstream systems:

Sometimes the query is a company name + city: POI search is better.
Sometimes it’s a literal address: address geocoding is better.
Sometimes it’s a half-address (“Main St Seattle”): POI might find a canonical thing when address geocode can’t.

The mistake we fixed was letting this fallback happen blindly.

The correct behavior is:

If the string is likely an address, skip POI entirely.
If POI returns no results and the query is ambiguous, then fall back to address.
Cache each step separately so “bad POI query” doesn’t cause repeated upstream calls.

That third point is easy to miss: if you store only the final “best effort,” you’ll keep retrying the losing branch.

Quota smoothing in the real system: shaping bursts, not just retrying

Backoff only helps after the upstream is already unhappy.

The better move is to avoid bursts in the first place. In the production integration, I enforce this at two layers:

per-process concurrency caps (async semaphore around outbound calls)
cache-first with negative caching so repeated bad queries don’t hammer the API

This is the same mental model as the advisor enrichment worker’s credit accounting: you don’t “audit later,” you design the flow so it can’t exceed its budget accidentally.

How the discipline shows up elsewhere: Firecrawl credits and feature flags

Even though this post is about Azure Maps, the codebase has a consistent theme: “make cost and control explicit.”

You can see it in the advisor enrichment worker.

Credit accounting as a loop invariant

In advisor-enrichment-worker/workflow_executor.py, we iterate advisors, call a Firecrawl agent with a max_credits cap, and accumulate total_credits.

The tell is the comment about attribute access:

the result is a Pydantic model, so reading .credits_used is correct
the earlier bug was treating it like a dict (.get()), which silently breaks accounting

Here’s a minimal, runnable reproduction of that exact class of bug:

from dataclasses import dataclass


@dataclass
class AgentResult:
    credits_used: float
    payload: dict


def broken_accounting(result: AgentResult) -> float:
    # This is the mistake: treating a model like a dict.
    # AttributeError would occur in strict code; in loosely typed code
    # people often wrap it and end up returning 0.
    try:
        return result.get("credits_used", 0.0)  # type: ignore[attr-defined]
    except Exception:
        return 0.0


def correct_accounting(result: AgentResult) -> float:
    return float(result.credits_used)


if __name__ == "__main__":
    r = AgentResult(credits_used=1.25, payload={"ok": True})
    assert broken_accounting(r) == 0.0
    assert correct_accounting(r) == 1.25
    print("ok")

That’s why I like having “credits used” flow through the code as a first-class value: you can unit test it. You can put assertions around it. You can build a budget gate.

Feature flags as operational kill switches

advisor-enrichment-worker/app/api/v1/settings_routes.py defines in-memory feature flags (with a note that production should use Redis/DB). The important part isn’t the storage mechanism; it’s the existence of a fast kill switch.

Here’s a complete, runnable example of the pattern (matching the shape in that file: {name: (enabled, description)} and a Pydantic update model):

from typing import Dict, Tuple
from pydantic import BaseModel


class FeatureFlagUpdate(BaseModel):
    enabled: bool


FEATURE_FLAGS: Dict[str, Tuple[bool, str]] = {
    "linkedin_matching": (True, "Auto-match advisors to LinkedIn profiles"),
    "brokercheck_enrichment": (True, "Enable FINRA BrokerCheck lookups via Firecrawl"),
    "firecrawl_research": (True, "Enable Firecrawl research jobs and SSE streaming"),
    "azure_maps_geocoding": (True, "Enable Azure Maps geocoding for city/state normalization"),
}


def set_flag(name: str, update: FeatureFlagUpdate) -> None:
    if name not in FEATURE_FLAGS:
        raise KeyError(name)
    _, desc = FEATURE_FLAGS[name]
    FEATURE_FLAGS[name] = (bool(update.enabled), desc)


if __name__ == "__main__":
    assert FEATURE_FLAGS["azure_maps_geocoding"][0] is True
    set_flag("azure_maps_geocoding", FeatureFlagUpdate(enabled=False))
    assert FEATURE_FLAGS["azure_maps_geocoding"][0] is False
    print("ok")

When you’re running enrichment at scale, feature flags aren’t “nice to have.” They’re how you avoid turning a partial outage into a full outage.

One diagram: geocoding in the enrichment graph (as it actually behaves)

This is the dataflow I ship mentally when I touch this system: prefer existing structured signals, accept Firecrawl if it already solved it, and only then call Azure Maps—through a cache boundary.

flowchart TD
  client["Client - Workflow Runner"] --> enrichFlow["LangGraph enrichment flow"]
  enrichFlow --> db["PostgreSQL and CRM fields"]
  enrichFlow --> firecrawl["Firecrawl research"]
  firecrawl -->|"city and state present"| shortCircuit["Short-circuit - skip Azure Maps"]
  firecrawl -->|"missing fields"| geoGate["Geocoding gate"]
  geoGate --> azureMaps["Azure Maps Client"]
  azureMaps --> cache["Cache - query keys + TTL policy"]
  azureMaps --> http["Azure Maps HTTP endpoints"]
  enrichFlow --> output["Normalized entity record"]

Operational edge cases (the stuff that bites you at 2 a.m.)

1) Empty results can be more expensive than good results

When an upstream returns “no matches,” your system is tempted to keep trying:

alternate spelling
removing punctuation
widening the query

That’s fine—once. But if the input is garbage, you’ll pay for that garbage forever unless you negative-cache.

So we cache empty-ish results briefly. Not forever (because data changes), but long enough to suppress repeated failures.

2) Timeouts must be budgeted, not defaulted

Geocoding calls often happen inside larger workflows. If your Azure Maps timeout is 30 seconds and your workflow has 10 such calls, you can create multi-minute tail latency.

The fix is simple: keep geocoding timeouts tight, and rely on cache + retries with backoff for resilience.

3) Caching must include parameters, or you will lie to yourself

If country_filter changes the answer, it must be in the key.

Same for anything that changes the result set (language, typeahead bias, etc.). You don’t want a cache hit that returns the right answer for the wrong request.

Closing

The moment we added “Fix 2” and “Fix 7” in the graph—routing address queries correctly and short-circuiting Azure Maps when Firecrawl already delivered city/state—geocoding stopped being a background detail and became a budgeted, testable subsystem. Cache-first keys, TTL policy, and quota smoothing aren’t performance tricks; they’re how you prevent one fuzzy location string from turning into a thousand identical upstream calls.

🎧 Listen to the audiobook — Spotify · Google Play · All platforms
🎬 Watch the visual overviews on YouTube
📖 Read the full 13-part series with AI assistant

DEV Community