Daniel Romitelli

Posted on Mar 19 • Edited on Mar 23 • Originally published at craftedbydaniel.com

The Search Router That Saved Me From One Index to Rule Them All: Azure AI Search for CRM, pgvector for Transcripts, the CRM a...

#search #azure #postgres #architecture

I noticed it the first time a "simple" search request turned into an argument between three databases.

A sales rep asked for "forklifts in Texas with 10,000+ lb capacity under active warranty," and I realized I could answer that question three different ways—each one correct in isolation, and each one wrong in production for a different reason. Azure AI Search could do synonym expansion and semantic reranking over my CRM-shaped records. Postgres + pgvector could do fast similarity over Zoom transcripts without duplicating data into yet another index. And the CRM—the messy, real-time source—was sometimes the only place the newest truth existed.

This is Part 8 of my series "How to Architect an Enterprise AI System (And Why the Engineer Still Matters)". Part 7 was about building a reliable sync engine against a CRM API with undocumented behaviors. This one is what I built on top of that: a search router that decides, per query, which backend deserves the job.

The key insight (and why the naive approach fails)

The naive approach is seductive: pick one search system, shove everything into it, and call it a day.

I had two obvious "one tool" options:

Put everything into Azure AI Search. It's great for structured indexes and has features I actually want for CRM data: faceted search, semantic reranking, geo-distance scoring, synonym maps.
Put everything into Postgres + pgvector. My transcripts already live in Postgres, and pgvector gives me semantic retrieval without maintaining a parallel indexing universe.

The failure mode isn't "it doesn't work." The failure mode is operational dishonesty:

If I duplicate transcripts into Azure AI Search, I'm signing up for index duplication and drift.
If I force CRM-shaped search into pgvector-only semantics, I lose the structured search affordances that make CRM data usable.
If I pretend sync is perfectly real-time, I'll return stale answers the moment a CRM record changes and the sync hasn't caught up.

So I made a different bet: I route queries to the right backend based on what the user is asking for, and I keep a real-time CRM fallback when the index can't be trusted.

That routing layer is the whole trick. It's not glamorous, but it's the difference between a demo and a system.

What I actually built: UnifiedSearchService

The core decision in this post is explicit:

Azure AI Search for structured CRM data
pgvector for Zoom transcripts (because the data is already in Postgres)
The CRM API as a real-time fallback

I implemented that as a single service: UnifiedSearchService.

Conceptually, it's a dispatcher:

It looks at the query (and sometimes the target domain: assets vs contracts vs companies vs transcripts).
It chooses a backend.
It shapes the request into the backend's native strengths.
It returns results in a unified format so the rest of the system doesn't care.

One important piece of pragmatism: during migration, I maintained two Azure AI Search implementations—SDK and HTTP—and a feature flag decides which one is active.

Architecture at a glance

flowchart TD
  userQuery[User query] --> unifiedSearch[UnifiedSearchService]

  unifiedSearch -->|keyword dispatch| azureSearch[Azure AI Search]
  unifiedSearch -->|transcript path| pgvectorStore[Postgres pgvector]
  unifiedSearch -->|real-time fallback| crmApi[CRM API]

  azureSearch --> azureIndexes[Azure indexes]
  pgvectorStore --> zoomTranscripts[Zoom transcripts]
  crmApi --> crmTruth[CRM live data]

  azureSearch --> unifiedResults[Unified results]
  pgvectorStore --> unifiedResults
  crmApi --> unifiedResults

The non-obvious part is that I'm not trying to hide the differences between these systems—I'm trying to exploit them without leaking them into the product.

How keyword-based routing works (and why it's intentionally "dumb")

The routing logic is keyword-based dispatch. That's deliberate.

I've built enough AI systems to know that if you make routing "smart" in the wrong way, it becomes flaky. Here, I want something deterministic and easy to debug: when a query smells like transcript search, it goes to pgvector; when it smells like CRM structured retrieval, it goes to Azure AI Search; when freshness matters, I have a CRM escape hatch.

import asyncio
from dataclasses import dataclass, field
from typing import Any, Optional

@dataclass
class SearchResult:
    id: str
    score: float
    source: str          # "azure", "pgvector", "crm"
    title: str
    snippet: str
    metadata: dict = field(default_factory=dict)


# Keyword triggers for each backend
ASSET_KEYWORDS   = {"asset", "equipment", "lease", "fleet", "vehicle", "forklift"}
CONTRACT_KEYWORDS = {"contract", "deal", "proposal", "renewal", "closed", "won"}
TRANSCRIPT_KEYWORDS = {"meeting", "call", "transcript", "discussion", "said", "mentioned"}


class UnifiedSearchService:
    def __init__(self, azure_client, pg_pool, crm_client, embedder):
        self.azure = azure_client
        self.pg = pg_pool
        self.crm = crm_client
        self.embedder = embedder

    async def search(
        self,
        query: str,
        force_live: bool = False,
        top: int = 20,
    ) -> list[SearchResult]:
        """
        Route a query to the right backend based on keyword dispatch.
        Returns unified results regardless of which backend answered.
        """
        if force_live:
            return await self._search_crm_live(query, top)

        query_lower = query.lower()
        tasks = []

        # Dispatch to backends based on keyword triggers
        if any(w in query_lower for w in ASSET_KEYWORDS | CONTRACT_KEYWORDS):
            tasks.append(self._search_azure(query, top))

        if any(w in query_lower for w in TRANSCRIPT_KEYWORDS):
            tasks.append(self._search_transcripts(query, top))

        # Default: if no keywords matched, try Azure (structured is the safe default)
        if not tasks:
            tasks.append(self._search_azure(query, top))

        # Run all matched backends concurrently
        results_lists = await asyncio.gather(*tasks, return_exceptions=True)

        # Merge results, skip backends that errored
        merged: list[SearchResult] = []
        for result in results_lists:
            if isinstance(result, Exception):
                continue  # log it, don't crash
            merged.extend(result)

        return merged

What surprised me is how often "dumb" routing is more reliable than clever routing. When something misroutes, I can fix it by adjusting keywords and rules—not by trying to reverse-engineer why a model picked the wrong path.

Azure AI Search: multi-vector scoring with 4 embedding fields

For CRM data, I use Azure AI Search, and the implementation leans on a multi-vector representation:

profile_vector
experience_vector
specs_vector
general_vector

Those four vectors come from my embedding pipeline (the embedding agent generates exactly those four parallel embeddings per record).

The idea is simple: an equipment asset might match a query because its specs are perfect even if its general description text is noisy, or because its service history aligns even if its specs list is sparse. One vector can lie; four vectors give me multiple angles.

The Azure side then does weighted fusion across those vectors using RRF (Reciprocal Rank Fusion).

from azure.search.documents import SearchClient
from azure.search.documents.models import (
    VectorizableTextQuery,
    QueryType,
)
from typing import Any


async def search_azure_multi_vector(
    client: SearchClient,
    query: str,
    query_embedding: list[float],
    filters: Optional[dict[str, Any]] = None,
    top: int = 20,
) -> list[SearchResult]:
    """
    Multi-vector search across 4 embedding fields with semantic reranking.
    Oversampling retrieves 2x candidates before reranking narrows to top-k.
    """
    # Build vector queries — one per embedding field
    vector_queries = [
        VectorizableTextQuery(
            text=query,
            k_nearest_neighbors=top * 2,  # oversample for better reranking
            fields=field_name,
        )
        for field_name in [
            "profile_vector",
            "experience_vector",
            "specs_vector",
            "general_vector",
        ]
    ]

    # Build OData filter from structured constraints
    odata_filter = _build_odata_filter(filters) if filters else None

    results = client.search(
        search_text=query,
        vector_queries=vector_queries,
        filter=odata_filter,
        query_type=QueryType.SEMANTIC,
        semantic_configuration_name="default",
        top=top,
    )

    return [
        SearchResult(
            id=r["id"],
            score=r.get("@search.reranker_score", r.get("@search.score", 0)),
            source="azure",
            title=r.get("title", ""),
            snippet=r.get("description", "")[:200],
            metadata={k: v for k, v in r.items() if not k.startswith("@")},
        )
        for r in results
    ]


def _build_odata_filter(filters: dict[str, Any]) -> str:
    """
    Convert a parameter dict into an Azure AI Search OData filter.
    Handles booleans, numerics, strings, and collection fields.
    """
    clauses = []
    for field_name, value in filters.items():
        if isinstance(value, bool):
            clauses.append(f"{field_name} eq {str(value).lower()}")
        elif isinstance(value, (int, float)):
            clauses.append(f"{field_name} ge {value}")
        elif isinstance(value, list):
            # OR across multiple values
            sub = " or ".join(f"{field_name} eq '{v}'" for v in value)
            clauses.append(f"({sub})")
        else:
            clauses.append(f"{field_name} eq '{value}'")
    return " and ".join(clauses)

I learned the hard way that "just do one embedding similarity" is how you get results that look plausible but fail in edge cases. The multi-vector setup gives me a way to recover when one representation is incomplete.

Semantic reranking (and automatic fallback)

Azure AI Search semantic reranking is a feature I use when it's available, but I designed the system to fall back automatically when it isn't.

That fallback matters because availability is part of correctness: if reranking is intermittently unavailable and your system depends on it, your relevance will flap.

I'm intentionally not romantic about reranking. It's a tool, not a religion.

pgvector: transcripts stay in Postgres

For Zoom transcripts, I route to Postgres + pgvector.

The decision here is almost boring, which is exactly why it's right:

The transcript data already lives in Postgres.
Duplicating it into Azure AI Search would create a second copy that can drift.
pgvector gives me semantic retrieval where the data already is.

import asyncpg
from openai import AsyncOpenAI

openai = AsyncOpenAI()


async def search_transcripts(
    pool: asyncpg.Pool,
    query: str,
    top: int = 20,
) -> list[SearchResult]:
    """
    Semantic search over Zoom transcripts stored in Postgres with pgvector.
    Generates a query embedding, then finds nearest neighbors by cosine distance.
    Falls back to full-text search if embedding generation fails.
    """
    try:
        # Generate query embedding
        resp = await openai.embeddings.create(
            input=query,
            model="text-embedding-3-large",
        )
        query_embedding = resp.data[0].embedding

        # Cosine similarity search via pgvector
        async with pool.acquire() as conn:
            rows = await conn.fetch(
                """
                SELECT
                    id,
                    title,
                    transcript_text,
                    1 - (transcript_embedding <=> $1::vector) AS similarity
                FROM zoom_transcripts
                WHERE transcript_embedding IS NOT NULL
                ORDER BY transcript_embedding <=> $1::vector
                LIMIT $2
                """,
                str(query_embedding),
                top,
            )

    except Exception:
        # Fallback: full-text search when embeddings are unavailable
        async with pool.acquire() as conn:
            rows = await conn.fetch(
                """
                SELECT
                    id,
                    title,
                    transcript_text,
                    ts_rank(to_tsvector('english', transcript_text),
                            plainto_tsquery('english', $1)) AS similarity
                FROM zoom_transcripts
                WHERE to_tsvector('english', transcript_text)
                      @@ plainto_tsquery('english', $1)
                ORDER BY similarity DESC
                LIMIT $2
                """,
                query,
                top,
            )

    return [
        SearchResult(
            id=str(row["id"]),
            score=float(row["similarity"]),
            source="pgvector",
            title=row["title"],
            snippet=row["transcript_text"][:200],
        )
        for row in rows
    ]

The subtle win is operational: I'm not maintaining a second indexing pipeline for transcripts. The moment you do that, you're not building search anymore—you're building reconciliation.

The CRM API as the real-time fallback

Even with a good sync engine (Part 7), there are moments where the index is not the source of truth.

So the router keeps a CRM API real-time fallback.

This is not a performance play; it's a correctness play. When the system suspects that the indexed view might be stale for a query that needs freshness, it can ask the CRM directly.

That "ask the CRM directly" path is what keeps the system honest. It's also what lets me make stronger caching and indexing decisions elsewhere, because I'm not pretending the index is always perfect.

The migration pragmatism: USE_AZURE_SEARCH_SDK

I maintained both an SDK-based and an HTTP-based Azure AI Search implementation during a migration, and I controlled it with a feature flag: USE_AZURE_SEARCH_SDK.

That choice is one of those unsexy engineering decisions that makes everything else possible:

It lets me ship improvements without betting the entire system on a big-bang switch.
It gives me a controlled rollback path.
It makes it possible to compare behavior between implementations.

I'm calling this out because it's a pattern: when you're changing a foundational backend, you want a kill switch and a way to run both paths.

A concrete example: one query, three tiers of truth

Here's the exact query I use to explain the system:

"forklifts in Texas with 10,000+ lb capacity under active warranty"

The router treats that as a structured CRM-style query:

"forklifts" is an equipment type constraint.
"Texas" is a geo constraint.
"10,000+ lb capacity" is a numeric spec constraint.
"active warranty" is a status filter.

Tier 1: Azure AI Search (structured CRM indexes)

Azure AI Search is the natural home for this query because it's built for:

structured constraints (equipment type, numeric thresholds like capacity)
geo filtering/scoring
semantic reranking
synonym maps for equipment domain language

This is also where my equipment domain synonym maps matter. Different teams say "forklift" vs "lift truck" vs "material handler" for the same thing, and synonym maps are one of the few scaling tools that doesn't require model calls.

Tier 2: pgvector (what it would look like)

If I forced this query into pgvector-only semantics, I could retrieve assets whose text mentions forklifts and Texas and capacity, but I'd be rebuilding structured filtering logic in the application layer.

That's not impossible—it's just the wrong tool for the data shape.

Tier 3: The CRM fallback (when freshness wins)

If the query is run in a context where the newest CRM edits matter (or the index is suspected stale), the router can choose the CRM API path.

The CRM is not where I want to do heavy semantic retrieval. It's where I go when I need the newest truth.

The tradeoff: you're building a router, not a "unified index"

The limitation of this approach is real: you're accepting that there isn't one perfect search backend.

That means:

You need a unified result shape.
You need consistent logging so you can see which backend answered.
You need to keep routing rules understandable.

But the benefit is that each backend gets to do what it's good at:

Azure AI Search handles CRM-shaped retrieval with semantic reranking and synonym maps.
pgvector handles transcript similarity where the data already lives.
The CRM provides the escape hatch when the index can't be trusted.

My one analogy for this is a workshop: I don't use a table saw to tighten a bolt. I keep the bolt near the wrench, the wood near the saw, and I build a habit of reaching for the right tool without thinking about it.

Closing

The real win of UnifiedSearchService isn't that it searches three backends—it's that it lets me stop arguing with myself about which backend is "the one true system." I built a router that respects the shape of the data, keeps transcripts where they already live, and still has the humility to ask the CRM when real-time truth matters. No search framework made that decision for me—it came from understanding the data well enough to know that "unified" doesn't mean "identical," and that's the kind of judgment that makes the next decision—dual-channel streaming with interruptible UX in Part 9—possible without turning the whole platform into a guessing game.

🎧 Listen to the audiobook — Spotify · Google Play · All platforms
🎬 Watch the visual overviews on YouTube
📖 Read the full 13-part series with AI assistant

DEV Community