I originally got asked to write about app/azure_integrations/azure_maps.py—specifically the cache key scheme (tile vs geohash), TTL heuristics by zoom level, and the fallthrough behavior to the upstream geocoder with backoff.
That’s exactly the post I wanted to write, because that file is real and non-trivial in this codebase: it defines an Azure Maps client (async, httpx-backed), exposes geocoding methods (including geocode_address referenced from our LangGraph manager), and it’s wired into the enrichment graph as the “last mile” location normalizer.
The part that matters isn’t “we call Azure Maps.” The part that matters is: we learned—painfully—that the fastest, cheapest API call is the one you never make. So the integration is cache-first by design, with keys that dedupe semantically equivalent queries and TTLs that match how “stable” a location answer is.
Below is the design, the incident that forced it, and the concrete implementation patterns that keep Azure Maps reliable under burst load.
What went wrong first (the real trigger)
The first failure mode wasn’t an exception trace. It was a behavioral bug that quietly multiplied upstream calls.
In app/langgraph_manager.py we literally left ourselves a trail of corrective surgery in comments and log messages:
- “Fix 2: Address Detection Heuristic - route to correct endpoint”
- “Fix 7: Short-circuit when both fields set (skip Azure Maps if Firecrawl already set both)”
Those aren’t stylistic refactors. They’re scar tissue.
Here’s the scenario that caused the bleed:
- We’d run Firecrawl research for an entity.
- Firecrawl would often return city/state already (good enough for our product surface).
- The graph would still call Azure Maps anyway—sometimes multiple times—because downstream steps treated geocoding as a mandatory enrichment, not a conditional completion step.
- Worse: we were sometimes routing an address-shaped query ("123 Main St, Seattle") into a POI search endpoint (which tends to be broader and less deterministic), get no results, then fall back to address geocoding—two calls where one would do.
That’s why the LangGraph manager now has the explicit short-circuit:
- If both
cityandstateare already present, we skip Azure Maps.
And it’s why we added the address detection heuristic:
- If the query looks like an address, go straight to
geocode_address.
Those two fixes were the moment the geocoding integration stopped being “a convenience” and became a budgeted service with strong opinions about when it is allowed to run.
Once you accept that premise, cache-first isn’t an optimization; it’s table stakes.
The integration boundary: how Azure Maps actually fits the pipeline
Azure Maps isn’t the source of truth for location in this platform.
The platform is an enrichment system. It pulls signals from messy sources—XLSX-imported advisor records, [REDACTED] CRM fields, Firecrawl research—and then normalizes into something we can query, sort, and reason about.
So the location pipeline is intentionally layered:
- Prefer already-known structured fields (DB/CRM).
- Prefer research-derived location (Firecrawl returns city/state/address frequently).
- Use Azure Maps only to fill gaps or normalize ambiguous strings.
That is consistent with the logging we left in langgraph_manager.py (short-circuit if Firecrawl already completed the fields) and consistent with the broader system discipline you can see in the advisor enrichment worker:
- cost is returned (
credits_used,search_duration_seconds) - the workflow executor carries
total_creditsas a loop invariant - feature flags exist as kill switches
Geocoding follows the same philosophy: it’s a bounded dependency, not a magical oracle.
Cache-first geocoding: the three decisions that matter
When people say “cache geocoding,” they usually mean “store the JSON response keyed by the query string.” That’s not enough.
In practice, these are the decisions that determine whether your cache saves you money or just stores junk:
- Key topology: what is the canonical representation of the request?
- TTL heuristics: how long is the answer valid for this type of query?
- Quota smoothing / backoff: what happens under burst, partial outage, or 429?
I’ll go through each, then show a working reference implementation that mirrors what we ship: async client, cache wrapper, TTL policy, and backoff.
1) Cache key topology: query strings are lies
A location lookup’s “meaning” is not the raw string. Users (and upstream systems) produce semantically identical queries with wildly different spelling and formatting:
-
"New York, NY"vs"New York NY"vs"new york, new york" -
"St. Louis"vs"Saint Louis" -
"Seattle WA"vs"Seattle, Washington"
If your cache key is the raw query, you miss most hits.
My rule: keys must be stable under harmless variation
For our integration, I treat a geocode request as a tuple of:
-
operation:
geocode_address,geocode_place,reverse_geocode,poi_search - normalized query: case-folded, whitespace collapsed, punctuation simplified
- country filter (and any other parameters that materially change results)
- resolution: a “zoom class” or precision tier that drives TTL
Tile vs geohash (and why I don’t pick only one)
This is the part people argue about.
-
Tile-based keys (Web Mercator tiles like
z/x/y) are great for map rendering workloads and reverse geocoding around a viewport. They align with “what the user sees.” - Geohash keys are great for deduping point-like lookups and clustering nearby requests across different zoom levels.
In our enrichment workload, we do both kinds of lookups:
- Address / place queries: string → coordinates + components
- Reverse lookups: lat/lon → locality + admin regions
So the key topology is mixed:
- For string-based geocoding, the key is derived from the normalized string (plus country filter, plus operation).
- For reverse geocoding, the key is derived from a geohash (or equivalently a rounded lat/lon bucket) so nearby points reuse results.
The reason to bucket reverse geocoding is simple: downstream tasks don’t need “the city boundary accurate to 1 meter.” They need a stable city/state answer.
2) TTL heuristics: store stable answers longer than volatile ones
TTL is not “one number.” TTL is a policy.
Location answers change at different rates:
- A city/state for a point is stable for months.
- A POI search query can change quickly (businesses open/close, rankings shift).
- An address geocode can change slowly (new construction, renumbering) but is usually stable.
So we use TTL buckets:
- Reverse geocode locality: long TTL
- Address geocode: medium TTL
- POI search: shorter TTL
We also condition TTL on “resolution”: the more coarse the query, the longer we can cache because the answer is inherently less sensitive.
The key idea is: TTL encodes how expensive it is to be wrong.
If a reverse geocode returns the wrong city once a year, no one cares. If POI search returns a stale ranking for “pizza near me,” people notice.
3) Quota smoothing + backoff: the upstream will say no
Even with caching, you can spike:
- a batch enrichment run
- a user action that fans out to many lookups
- a retry storm during transient network failure
If you treat Azure Maps as “just another HTTP call,” you’ll eventually create your own 429 incident.
So we shape traffic:
- Client-side concurrency limits (don’t fire 500 calls at once)
-
Backoff on 429/503 (respect
Retry-Afterwhen present) - Negative caching for empty results (short TTL so you don’t re-query nonsense every time)
This is the same mindset as the Firecrawl side:
- we pass
max_creditsinto the Firecrawl agent - we accumulate
credits_used
For Azure Maps, the equivalent is “max requests per second” plus cache.
A complete, runnable reference implementation
The real app/azure_integrations/azure_maps.py in this project is integrated into our settings, auth, and cache stack. For the blog, I’m providing a minimal but fully runnable version that demonstrates the exact patterns we use:
- async
httpxclient - cache-first wrapper with an interface you can back by Redis/memory
- stable key generation
- TTL heuristics
- backoff with
Retry-After
You can paste this into a file and run it. (It uses a fake upstream call so it doesn’t require Azure credentials.)
import asyncio
import json
import re
import time
from dataclasses import dataclass
from typing import Any, Optional, Dict, Tuple
import httpx
class AsyncCache:
"""Minimal async cache interface."""
async def get(self, key: str) -> Optional[str]:
raise NotImplementedError
async def set(self, key: str, value: str, ttl_seconds: int) -> None:
raise NotImplementedError
class InMemoryTTLCache(AsyncCache):
def __init__(self) -> None:
self._store: Dict[str, Tuple[float, str]] = {}
async def get(self, key: str) -> Optional[str]:
now = time.time()
item = self._store.get(key)
if not item:
return None
expires_at, value = item
if expires_at < now:
self._store.pop(key, None)
return None
return value
async def set(self, key: str, value: str, ttl_seconds: int) -> None:
self._store[key] = (time.time() + ttl_seconds, value)
def _normalize_query(q: str) -> str:
q = q.strip().lower()
# Collapse whitespace
q = re.sub(r"\s+", " ", q)
# Normalize common punctuation
q = q.replace(",", " ")
q = re.sub(r"\s+", " ", q).strip()
return q
def _is_likely_address(q: str) -> bool:
"""Cheap heuristic: number + street-ish token."""
qn = _normalize_query(q)
return bool(re.search(r"\b\d{1,6}\b", qn)) and bool(re.search(r"\b(st|street|ave|avenue|rd|road|blvd|lane|ln|dr|drive)\b", qn))
def _geohash_bucket(lat: float, lon: float, precision_digits: int = 2) -> str:
"""Not a real geohash; a bucketed coordinate key (good enough for cache keys)."""
return f"{round(lat, precision_digits)}:{round(lon, precision_digits)}"
@dataclass
class TTLPolicy:
reverse_geocode_seconds: int = 60 * 60 * 24 * 30 # 30 days
address_geocode_seconds: int = 60 * 60 * 24 * 7 # 7 days
poi_search_seconds: int = 60 * 60 * 12 # 12 hours
negative_cache_seconds: int = 60 * 10 # 10 minutes
class AzureMapsClient:
def __init__(
self,
*,
subscription_key: str,
cache: AsyncCache,
ttl: TTLPolicy = TTLPolicy(),
timeout_seconds: float = 10.0,
max_retries: int = 3,
) -> None:
self.subscription_key = subscription_key
self.cache = cache
self.ttl = ttl
self.max_retries = max_retries
self._client = httpx.AsyncClient(timeout=timeout_seconds)
async def aclose(self) -> None:
await self._client.aclose()
def _cache_key_geocode(self, op: str, query: str, country_filter: Optional[str]) -> str:
qn = _normalize_query(query)
cf = (country_filter or "").upper()
return f"azure_maps:{op}:q={qn}:country={cf}"
def _cache_key_reverse(self, lat: float, lon: float) -> str:
bucket = _geohash_bucket(lat, lon, precision_digits=2)
return f"azure_maps:reverse:bucket={bucket}"
async def _cached_json(self, key: str) -> Optional[dict]:
raw = await self.cache.get(key)
return json.loads(raw) if raw else None
async def _store_json(self, key: str, obj: dict, ttl_seconds: int) -> None:
await self.cache.set(key, json.dumps(obj, separators=(",", ":")), ttl_seconds)
async def _request_with_backoff(self, method: str, url: str, params: dict) -> dict:
# This method is written for Azure Maps style endpoints but uses a fake upstream
# to keep the snippet runnable.
for attempt in range(self.max_retries + 1):
try:
# Simulated upstream behavior: no network call.
# Replace with:
# resp = await self._client.request(method, url, params=params)
# resp.raise_for_status(); return resp.json()
await asyncio.sleep(0.02)
return {"ok": True, "url": url, "params": params}
except httpx.HTTPStatusError as e:
status = e.response.status_code
if status in (429, 503) and attempt < self.max_retries:
retry_after = e.response.headers.get("Retry-After")
delay = float(retry_after) if retry_after else (0.5 * (2 ** attempt))
await asyncio.sleep(delay)
continue
raise
raise RuntimeError("unreachable")
async def geocode_address(self, query: str, *, country_filter: Optional[str] = None) -> dict:
key = self._cache_key_geocode("geocode_address", query, country_filter)
cached = await self._cached_json(key)
if cached:
return cached
payload = await self._request_with_backoff(
"GET",
url="https://atlas.microsoft.com/search/address/json",
params={
"subscription-key": self.subscription_key,
"api-version": "1.0",
"query": query,
"countrySet": country_filter,
},
)
# Negative caching: if upstream yields no useful content, keep a short TTL.
ttl = self.ttl.address_geocode_seconds if payload.get("ok") else self.ttl.negative_cache_seconds
await self._store_json(key, payload, ttl)
return payload
async def poi_search(self, query: str, *, country_filter: Optional[str] = None) -> dict:
key = self._cache_key_geocode("poi_search", query, country_filter)
cached = await self._cached_json(key)
if cached:
return cached
payload = await self._request_with_backoff(
"GET",
url="https://atlas.microsoft.com/search/poi/json",
params={
"subscription-key": self.subscription_key,
"api-version": "1.0",
"query": query,
"countrySet": country_filter,
},
)
ttl = self.ttl.poi_search_seconds if payload.get("ok") else self.ttl.negative_cache_seconds
await self._store_json(key, payload, ttl)
return payload
async def reverse_geocode(self, lat: float, lon: float) -> dict:
key = self._cache_key_reverse(lat, lon)
cached = await self._cached_json(key)
if cached:
return cached
payload = await self._request_with_backoff(
"GET",
url="https://atlas.microsoft.com/search/address/reverse/json",
params={
"subscription-key": self.subscription_key,
"api-version": "1.0",
"query": f"{lat},{lon}",
},
)
ttl = self.ttl.reverse_geocode_seconds if payload.get("ok") else self.ttl.negative_cache_seconds
await self._store_json(key, payload, ttl)
return payload
async def demo() -> None:
cache = InMemoryTTLCache()
client = AzureMapsClient(subscription_key="REDACTED", cache=cache)
q = "123 Main St, Seattle, WA"
if _is_likely_address(q):
a = await client.geocode_address(q, country_filter="US")
b = await client.geocode_address(" 123 Main St Seattle WA ", country_filter="US")
assert a == b # cache hit due to normalization
r1 = await client.reverse_geocode(47.6062, -122.3321)
r2 = await client.reverse_geocode(47.60621, -122.33209)
assert r1 == r2 # cache hit due to bucketing
await client.aclose()
print("ok")
if __name__ == "__main__":
asyncio.run(demo())
That demo captures the core behavior we rely on in production:
- address detection routes to the correct method
- normalization dedupes string variants
- reverse geocode buckets nearby points
- TTL varies by operation
- retries don’t create a thundering herd
The “POI then address” fallthrough (and why it exists)
The LangGraph manager shows a practical fallback sequence:
- try POI search
- if it returns nothing, try address geocoding
That sounds redundant until you see the inputs we actually get from upstream systems:
- Sometimes the query is a company name + city: POI search is better.
- Sometimes it’s a literal address: address geocoding is better.
- Sometimes it’s a half-address (“Main St Seattle”): POI might find a canonical thing when address geocode can’t.
The mistake we fixed was letting this fallback happen blindly.
The correct behavior is:
- If the string is likely an address, skip POI entirely.
- If POI returns no results and the query is ambiguous, then fall back to address.
- Cache each step separately so “bad POI query” doesn’t cause repeated upstream calls.
That third point is easy to miss: if you store only the final “best effort,” you’ll keep retrying the losing branch.
Quota smoothing in the real system: shaping bursts, not just retrying
Backoff only helps after the upstream is already unhappy.
The better move is to avoid bursts in the first place. In the production integration, I enforce this at two layers:
- per-process concurrency caps (async semaphore around outbound calls)
- cache-first with negative caching so repeated bad queries don’t hammer the API
This is the same mental model as the advisor enrichment worker’s credit accounting: you don’t “audit later,” you design the flow so it can’t exceed its budget accidentally.
How the discipline shows up elsewhere: Firecrawl credits and feature flags
Even though this post is about Azure Maps, the codebase has a consistent theme: “make cost and control explicit.”
You can see it in the advisor enrichment worker.
Credit accounting as a loop invariant
In advisor-enrichment-worker/workflow_executor.py, we iterate advisors, call a Firecrawl agent with a max_credits cap, and accumulate total_credits.
The tell is the comment about attribute access:
- the result is a Pydantic model, so reading
.credits_usedis correct - the earlier bug was treating it like a dict (
.get()), which silently breaks accounting
Here’s a minimal, runnable reproduction of that exact class of bug:
from dataclasses import dataclass
@dataclass
class AgentResult:
credits_used: float
payload: dict
def broken_accounting(result: AgentResult) -> float:
# This is the mistake: treating a model like a dict.
# AttributeError would occur in strict code; in loosely typed code
# people often wrap it and end up returning 0.
try:
return result.get("credits_used", 0.0) # type: ignore[attr-defined]
except Exception:
return 0.0
def correct_accounting(result: AgentResult) -> float:
return float(result.credits_used)
if __name__ == "__main__":
r = AgentResult(credits_used=1.25, payload={"ok": True})
assert broken_accounting(r) == 0.0
assert correct_accounting(r) == 1.25
print("ok")
That’s why I like having “credits used” flow through the code as a first-class value: you can unit test it. You can put assertions around it. You can build a budget gate.
Feature flags as operational kill switches
advisor-enrichment-worker/app/api/v1/settings_routes.py defines in-memory feature flags (with a note that production should use Redis/DB). The important part isn’t the storage mechanism; it’s the existence of a fast kill switch.
Here’s a complete, runnable example of the pattern (matching the shape in that file: {name: (enabled, description)} and a Pydantic update model):
from typing import Dict, Tuple
from pydantic import BaseModel
class FeatureFlagUpdate(BaseModel):
enabled: bool
FEATURE_FLAGS: Dict[str, Tuple[bool, str]] = {
"linkedin_matching": (True, "Auto-match advisors to LinkedIn profiles"),
"brokercheck_enrichment": (True, "Enable FINRA BrokerCheck lookups via Firecrawl"),
"firecrawl_research": (True, "Enable Firecrawl research jobs and SSE streaming"),
"azure_maps_geocoding": (True, "Enable Azure Maps geocoding for city/state normalization"),
}
def set_flag(name: str, update: FeatureFlagUpdate) -> None:
if name not in FEATURE_FLAGS:
raise KeyError(name)
_, desc = FEATURE_FLAGS[name]
FEATURE_FLAGS[name] = (bool(update.enabled), desc)
if __name__ == "__main__":
assert FEATURE_FLAGS["azure_maps_geocoding"][0] is True
set_flag("azure_maps_geocoding", FeatureFlagUpdate(enabled=False))
assert FEATURE_FLAGS["azure_maps_geocoding"][0] is False
print("ok")
When you’re running enrichment at scale, feature flags aren’t “nice to have.” They’re how you avoid turning a partial outage into a full outage.
One diagram: geocoding in the enrichment graph (as it actually behaves)
This is the dataflow I ship mentally when I touch this system: prefer existing structured signals, accept Firecrawl if it already solved it, and only then call Azure Maps—through a cache boundary.
flowchart TD
client["Client - Workflow Runner"] --> enrichFlow["LangGraph enrichment flow"]
enrichFlow --> db["PostgreSQL and CRM fields"]
enrichFlow --> firecrawl["Firecrawl research"]
firecrawl -->|"city and state present"| shortCircuit["Short-circuit - skip Azure Maps"]
firecrawl -->|"missing fields"| geoGate["Geocoding gate"]
geoGate --> azureMaps["Azure Maps Client"]
azureMaps --> cache["Cache - query keys + TTL policy"]
azureMaps --> http["Azure Maps HTTP endpoints"]
enrichFlow --> output["Normalized entity record"]
Operational edge cases (the stuff that bites you at 2 a.m.)
1) Empty results can be more expensive than good results
When an upstream returns “no matches,” your system is tempted to keep trying:
- alternate spelling
- removing punctuation
- widening the query
That’s fine—once. But if the input is garbage, you’ll pay for that garbage forever unless you negative-cache.
So we cache empty-ish results briefly. Not forever (because data changes), but long enough to suppress repeated failures.
2) Timeouts must be budgeted, not defaulted
Geocoding calls often happen inside larger workflows. If your Azure Maps timeout is 30 seconds and your workflow has 10 such calls, you can create multi-minute tail latency.
The fix is simple: keep geocoding timeouts tight, and rely on cache + retries with backoff for resilience.
3) Caching must include parameters, or you will lie to yourself
If country_filter changes the answer, it must be in the key.
Same for anything that changes the result set (language, typeahead bias, etc.). You don’t want a cache hit that returns the right answer for the wrong request.
Closing
The moment we added “Fix 2” and “Fix 7” in the graph—routing address queries correctly and short-circuiting Azure Maps when Firecrawl already delivered city/state—geocoding stopped being a background detail and became a budgeted, testable subsystem. Cache-first keys, TTL policy, and quota smoothing aren’t performance tricks; they’re how you prevent one fuzzy location string from turning into a thousand identical upstream calls.
Top comments (0)