Every enterprise healthcare payer I work with has the same problem.
They have years of investment in Snowflake — semantic models, claims analytics, carefully curated data products. They have Microsoft Fabric rolling out across their organization — lakehouses, Delta tables, real-time intelligence. They have Azure OpenAI licenses and ambitious AI roadmaps.
And they’re asking the same question: how do we put AI on top of all of this without ripping anything out?
This is the story of how I built HealthIQ — a unified healthcare intelligence platform that answers that question with a working, production-grade architecture.
— -
The Problem With “AI on Your Data”
Most “AI on your data” demos show a chatbot connected to a single database. Ask it a question, it writes SQL, returns an answer. Clean. Simple. And completely inadequate for enterprise healthcare.
Real healthcare analytics doesn’t live in one place. Claims financials live in Snowflake. Bed occupancy and staffing data live in Fabric lakehouses. Clinical policy documents live in document stores. Escalation workflows live in Logic Apps. Getting a complete picture of operational health requires crossing all of these — in a single, coherent answer.
The naive solution is to build one giant agent with every tool attached. I’ve seen this fail. At scale, a single agent juggling ten tools loses coherence. Routing degrades. Context windows fill up. The model gets confused about which tool to call when.
There’s a better pattern.
— -
The Architecture: Five Tiers, Two Specialists, One Orchestrator
HealthIQ is built on a five-tier architecture with a multi-agent A2A (Agent-to-Agent) orchestration layer on top.
Press enter or click to view image in full size
Tier 1 — Structured Analytics: Snowflake Cortex Agent
The foundation is a Snowflake semantic view (CLAIMS_SEMANTIC) that exposes curated claims metrics — total paid amounts, PMPM cost, denial rates by specialty, DRG utilization — as named business concepts rather than raw tables.
On top of that semantic view sits a Snowflake Cortex Agent, exposed as a managed MCP (Model Context Protocol) server. This means any AI orchestrator can call it with natural language and get back structured analytics — without writing a line of SQL.
Q: What were total claims paid in Q1 2026?
A: $370,685,724.65
Tier 2 — Operational Intelligence: Microsoft Fabric Data Agent
Four Delta tables in a Fabric Lakehouse capture hospital operational data — bed occupancy, staffing ratios, patient volume, authorization status. A Fabric Data Agent (HospitalOpsAgentV2) sits on top of these tables and answers operational questions in natural language.
Q: Which facility has the highest occupancy?
A: Riverside Health (Southeast) at 92%
Tier 3 — Document Intelligence: Azure AI Search RAG
A RAG layer built on Azure AI Search indexes a curated set of clinical policy documents, CMS benchmark reports, and medical director summaries — included here to showcase the pattern. In practice, this layer can connect to your existing document stores: SharePoint libraries, Azure Blob Storage, OneLake files, or any indexed enterprise content. The point isn’t the five documents I loaded — it’s that your existing knowledge assets become part of the reasoning chain without any restructuring. When the AI needs to explain why a metric looks the way it does — not just what the number is — it pulls context from this layer.
Tier 4 — Orchestration: Azure AI Foundry
This is where the intelligence lives. Two specialist agents coordinate through an A2A protocol:
ClaimsIntelligenceAgent — owns everything financial. It has access to the Snowflake MCP tool and the Policy Search RAG layer. It knows claims analytics the way a Senior Actuary knows claims analytics.
HospitalOpsAgent — owns everything operational. It has access to the Fabric Data Agent and an escalation workflow. It knows clinical capacity the way a COO knows clinical capacity.
Above them sits HealthcareOrchestratorV3 — a master agent that does no data access directly. Its only job is to understand the question, route to the right specialist(s), and synthesize their responses into a single executive answer. I chose GPT-4.1 for both the specialists and the synthesis step — the routing and cross-domain reasoning needed reliable structured output more than raw speed, and GPT-4.1 held up consistently across multi-turn agent calls without drifting off format.
Tier 5 — Action: Azure API Management + Logic App
This is where AI stops being a reporting layer and becomes an operational system. The action tier is unlimited in scope — constrained only by your business case, not the technology. Trigger a care management workflow. Update a claims record. Open a ServiceNow ticket. Push a notification to a clinical team. Invoke an RPA bot. Any system reachable via API becomes an action the AI can take.
For this showcase, I kept it simple: when occupancy hits 92% and the user asks to escalate, the orchestrator delegates to HospitalOpsAgent, which calls an APIM-proxied Logic App and delivers an escalation email to the care management team. The full loop — question to action — closes in one conversation. The email is illustrative. The pattern is production-grade.
— -
The A2A Pattern: Why It Matters
The key architectural insight is the separation between specialists and orchestrators.
Each specialist agent is small, focused, and owns its own tools. ClaimsIntelligenceAgent doesn’t know anything about bed occupancy. HospitalOpsAgent doesn’t know anything about PMPM cost. They’re experts in one domain.
The master orchestrator doesn’t touch data directly. It reasons about the question and delegates. For a cross-domain question like ”which hospitals have the highest occupancy and how does that correlate with our denial rates?” — it calls both specialists in parallel, waits for both responses, and synthesizes them into one answer.
This is how enterprise AI scales. Not one monolithic agent that knows everything, but a network of specialists coordinated by an orchestrator. As your data estate grows, you add specialist agents — one per domain, owned by the team that knows that data best. The orchestrator barely changes.
HealthcareOrchestratorV3
├── ClaimsIntelligenceAgent → Snowflake + Policy RAG
└── HospitalOpsAgent → Fabric Lakehouse + Escalation
— -
What This Enables for Healthcare Payers
For a managed care organization, this architecture answers the questions that actually matter:
”What’s driving our Cardiology denial rate, and how does it compare to CMS national benchmarks?” — Claims agent pulls the rate, RAG layer pulls the benchmark and the medical director’s context.
”Which facilities are at critical capacity right now?” — Ops agent returns live occupancy across all facilities, flags anything above 90%.
”Send an escalation alert for Riverside Health” — Ops agent calls the escalation workflow, email goes to the care management team, confirmation comes back in the chat.
All of this in a single Teams conversation. No SQL. No dashboard hunting. No switching between systems.
The “Aha” Moment for Enterprise AI
The insight that makes this architecture resonate with enterprise clients is simple:
You don’t need to centralize your data to centralize your intelligence.
Snowflake stays in Snowflake. Fabric stays in Fabric. Each system keeps its own governance, its own semantic layer, its own access controls. The AI orchestration layer sits on top and coordinates — it doesn’t absorb.
This is how you sell AI to an organization that has spent years building a data estate and isn’t going to blow it up for a chatbot.
— -
The Production Detail That Matters: APIM as a Proxy
Here’s the real-world detail that demos never show you.
Azure AI Foundry’s OpenAPI tool redacts query parameters before making HTTP calls. If your API uses query string authentication — SAS tokens, API version parameters — they get stripped to ?REDACTED before the call goes out.
The fix: put Azure API Management in front. Foundry sends a clean call with just an Ocp-Apim-Subscription-Key header. An APIM inbound policy uses set-backend-service and rewrite-uri to reconstruct the full URL with all parameters before forwarding. Foundry never sees the sensitive parameters. The API works correctly.
This is the kind of pattern that separates a demo from a production architecture.
Become a Medium member
There’s a second reason APIM belongs in this architecture, and it matters more than the redaction workaround: identity and access control. No enterprise lets AI agents call backend systems anonymously, and an API gateway is exactly where you enforce that. APIM ties into your identity provider — Entra ID, OAuth, whatever your enterprise standardizes on — so every call from an agent carries a verifiable identity, not just a static key. You get centralized logging of who (or what agent) accessed which system and when, rate limiting per caller, and the ability to revoke access without touching the agent itself. For a healthcare payer handling PHI-adjacent operational and claims data, that audit trail isn’t optional — it’s the control that makes the rest of this architecture deployable in production.
— -
Guardrails and Observability: The Parts That Make This Trustworthy
A working demo and a deployable system are different things. Two Foundry capabilities close that gap.
Guardrails. Azure AI Foundry lets you attach guardrails directly to each agent — content filtering, jailbreak detection, and groundedness checks that run before a response ever reaches the user. In a healthcare context this matters concretely: you don’t want an agent confidently answering a clinical policy question from a hallucinated detail, and you don’t want a claims agent exposed to prompt injection through a malformed query. Guardrails sit at the agent level, so each specialist enforces its own policy independent of how the orchestrator routes to it.
Tracing. Every agent call in Foundry — including the A2A hops between the orchestrator and each specialist — generates a trace. When a cross-domain question comes back wrong, or a tool call fails, the trace shows exactly which agent was invoked, what arguments it passed, what the tool returned, and how long each step took. This is the difference between debugging a black box and debugging a system. I used traces directly to diagnose a token-fetch failure in same-project A2A calls — without that visibility, it would have been a guessing exercise.
For any architecture handling claims or clinical data, guardrails and tracing aren’t optional extras. They’re what a security or compliance review will ask about first.
— -
What I’d Harden Before Calling This Production-Ready
It’s worth being honest about what a demo glosses over.
Error handling. Specialist agents occasionally hit transient failures — an MCP server timeout, a token refresh delay. Right now the orchestrator surfaces the error in its synthesis rather than retrying silently, which is the right behavior for a demo but needs a proper retry-with-backoff policy in production, plus a fallback message that doesn’t expose internal error text to the end user.
Cost and latency. Every specialist agent call has its own cost and latency profile. A cross-domain question that fans out to two agents in parallel costs roughly double a single-domain question and adds the synthesis call on top. At scale, the routing logic should account for this — not every question needs both specialists, and the keyword-based router I built for this showcase is a placeholder for a more deliberate semantic routing decision in GPT-4.1 itself.
Same-project A2A auth. As of this writing, Foundry’s native A2A wiring for same-project agents has rough edges around managed identity token exchange. I worked around it with a Python orchestration layer calling each agent’s A2A endpoint directly — which works well, but native in-Foundry publishing to Teams is the cleaner long-term path once that matures.
None of these are dealbreakers. They’re the normal gap between “I proved the pattern works” and “this is hardened for production traffic” — and naming that gap honestly is part of doing this work seriously.
— -
Tradeoffs I Considered and Rejected
Every architectural choice here had a simpler alternative I deliberately didn’t take. Naming them — and why — matters more than the choice itself.
A2A specialists vs. one agent with many tools. The simpler path is a single agent with every tool attached: Snowflake MCP, Fabric Data Agent, RAG, escalation, all in one system prompt. I rejected this after watching it degrade in practice. Past roughly five or six tools, a single agent starts misrouting — calling the wrong tool, or calling the right tool with the wrong framing because it’s reasoning about ten things at once instead of one. The single-agent model also has an organizational failure mode that matters more long-term than the technical one: it has no clean ownership boundary. If the claims team and the ops team are both editing the same agent’s system prompt and tool list, you get merge conflicts in intent, not just in code. A2A specialists cost you orchestration overhead and an extra synthesis call. What you get back is a system where each team owns a contained blast radius, and a routing failure in one specialist doesn’t take down the other. For two specialists, the overhead is a fair trade. For ten, it’s close to mandatory.
API gateway vs. native Foundry connectors. Foundry’s native OpenAPI and MCP tool support is the path of least resistance, and I started there. I moved to an APIM-fronted pattern for two reasons that aren’t visible in a demo. First, the practical one: Foundry’s OpenAPI tool redacts query parameters, which silently breaks any backend using query-string auth — there’s no native workaround inside Foundry itself. Second, the more important one: native connectors authenticate with a static key the agent holds directly. That’s acceptable for a prototype and not acceptable for a system touching claims or clinical data in an enterprise with an identity-provider standard. The gateway costs you one more hop and one more resource to operate. It buys you centralized identity enforcement, audit logging, and the ability to revoke a single caller’s access without touching the agent. For a regulated industry, that’s not a nice-to-have — it’s close to a requirement once this leaves the sandbox.
GPT-4.1 vs. a larger frontier model. I deliberately did not reach for the most powerful available model. The reasoning load in this architecture is mostly routing and structured synthesis — deciding which specialist to call, and combining two already-coherent responses into one — not open-ended novel reasoning. GPT-4.1 holds format reliably across multi-turn agent calls, which matters more here than raw capability, and it’s meaningfully cheaper and faster at the volume an orchestration layer like this generates, since every user question can trigger two to three model calls under the hood. The tradeoff I’m explicitly making is ceiling for cost and latency: if a future specialist needs to do something closer to genuine multi-step clinical reasoning rather than structured retrieval-and-synthesis, that specific agent should probably sit on a more capable model while the orchestrator stays lightweight. Model choice in a multi-agent system doesn’t have to be uniform, and treating it as a single decision for the whole architecture is itself a mistake worth avoiding.
What changes at 20 specialists instead of 2. This architecture works cleanly at two specialists with keyword-based routing. It does not survive unchanged at twenty — the router has to become semantic rather than keyword-based, classifying intent against a registry of agent capabilities instead of matching against a static list of trigger words. The deeper shift is that operational discipline which is manageable per-agent at this scale — tracing, guardrails, governance — has to become a platform-level concern rather than something each specialist handles on its own. I cover what that looks like concretely further down. None of this invalidates the pattern; it’s the normal maturity curve from proof-of-concept to platform, and it’s worth being explicit that this piece describes the former.
— -
Everything That Went Into This
It’s worth listing what actually got built, because the architecture diagram understates the amount of plumbing underneath it. None of this is glamorous, and all of it was necessary.
Data and semantic layer
Snowflake semantic view (
CLAIMS_SEMANTIC) exposing claims metrics as named business conceptsSnowflake Cortex Agent configured and published as a managed MCP server
Microsoft Fabric Lakehouse with four Delta tables modeling hospital operations
Fabric Data Agent built and tuned against those tables
Azure AI Search index populated with clinical policy documents for the RAG layer
Agents and orchestration
Two specialist agents (
ClaimsIntelligenceAgent,HospitalOpsAgent) built in Azure AI Foundry, each with its own system prompt, tool set, and guardrail configurationA master orchestrator (
HealthcareOrchestratorV3) coordinating both specialists via the A2A protocolA Python-based orchestration layer as a fallback path, handling routing, parallel agent calls, and GPT-4.1 synthesis directly against each agent’s A2A endpoint
Identity and security
System-assigned managed identities enabled on the Foundry resource and, separately, on the Function App used for the API wrapper — turning these on isn’t a checkbox, it required explicit role assignments at the resource scope rather than relying on subscription-level inheritance
Azure AI Developerrole explicitly granted to each managed identity at the project/resource scope it needed to callEntra ID-based authentication wired into the A2A tool connections in place of static keys
APIM configured with subscription-key and identity-aware policies in front of every externally-callable tool
Integration and action layer
Azure API Management instance fronting the Snowflake MCP connection, with inbound policies (
set-backend-service,rewrite-uri) to work around Foundry’s query-parameter redaction on OpenAPI toolsA second APIM-proxied connection in front of a Logic App for the escalation email workflow
Teams channel publishing for the production-facing orchestrator
Operational tooling
Guardrails configured per agent for content filtering and groundedness
Full request tracing enabled across both specialist agents and the orchestrator, used directly to diagnose a same-project A2A token-fetch failure
Python test harnesses validating each A2A endpoint independently before wiring them into the orchestrator
The unglamorous truth about agentic AI architecture is that the agent logic is maybe 30% of the work. The other 70% is identity, networking, observability, and the small production-grade fixes — like the APIM redaction workaround — that never show up in a demo but determine whether the thing actually ships.
— -
The Real Takeaway: This Is a Blueprint, Not a One-Off
Strip away the healthcare specifics and what’s left is a reusable pattern: take a large, ambiguous problem, decompose it into narrow specialist agents that each own one domain and one set of tools, and coordinate them with a master orchestrator that does routing and synthesis but touches no data directly. Claims and hospital operations are this instance of the pattern. The pattern itself applies anywhere a problem is too broad for one agent to reason about coherently but naturally splits along domain lines — financial services, supply chain, IT operations, customer support, regulatory compliance. The five tiers map differently each time; the shape doesn’t change.
What makes a blueprint actually reusable, though, isn’t the happy-path architecture diagram — it’s the checklist of what every specialist agent has to support before it’s allowed into the system, regardless of domain. This is the part teams skip when they’re moving fast, and the part that turns into an incident six months later when they didn’t.
The minimum bar for any specialist agent joining the platform:
Guardrails — content filtering, jailbreak detection, and groundedness checks configured at the agent level, not assumed to be inherited from the orchestrator
Tracing — full request/response tracing enabled by default, not added reactively after the first hard-to-diagnose failure
Identity — authenticates through the gateway with a verifiable identity (Entra ID or equivalent), never a static key held by the agent itself
Tool ownership boundary — a specialist’s tools are its own; no shared mutable state or shared credentials across specialists
Failure mode defined — an explicit answer to “what does this agent return when its tool fails,” not a default that leaks a stack trace to the end user
Cost and latency budget — a known per-call cost and latency profile, so the orchestrator’s routing decisions can be made with that cost visible, not discovered later in a bill
Ownership and versioning — a named team or owner, and a way to version the agent’s prompt and tools without breaking the orchestrator’s contract with it
Onboarding contract with the orchestrator — a clear description of what the agent does and doesn’t handle, so routing logic (keyword-based or semantic) has something accurate to route against
None of these are exotic. Most of them are the same governance disciplines that already exist for microservices, just translated to agents. The mistake is assuming agentic systems get a pass on that discipline because they’re new — they don’t, and the organizations that figure this out early are the ones who get past two specialist agents to twenty without the whole thing becoming unmanageable.
— -
What’s Next
The agent marketplace model is the natural evolution. Today HealthIQ has two specialists — claims and hospital ops. In six months it could have ten: pharmacy, member engagement, regulatory compliance, prior authorization, quality metrics. Each owned by the team that knows that domain best. Each an independent A2A agent that plugs into the master orchestrator.
The orchestrator barely changes. The intelligence scales linearly.
That’s the architecture worth building toward.
— -
Kartik Anand is a Cloud & AI Architect at Microsoft working on enterprise data platform engagements in healthcare. He specializes in agentic AI architectures on Microsoft Fabric, Azure AI Foundry, and Snowflake.
Interested in building something similar? Connect with me on LinkedIn.

Top comments (0)