Alex Natskovich

Posted on Dec 15, 2025

Why 80% of Healthcare AI Pilots Die in Pilot: The Data Architecture Problem

#ai #healthcare #softwaredevelopment #architecture

Healthcare data is a mess. You’ve got EHRs, labs, pharmacies, payers, and assorted vendors all speaking slightly different dialects of “almost-FHIR, sort-of-HL7, random CSV, and mystery XML”. On top of that, there are duplicates, missing fields, and business rules that only exist in someone’s head.

Drop a powerful LLM on top of that and you don’t get magic. You get unstable behavior, unsafe recommendations, and a project that never makes it out of “cool internal demo” mode.
If you want AI to do anything useful in healthcare, you need to fix the data layer first.
In this post, I’ll walk through how we at MEV approach AI-ready healthcare architectures: the core layers you need, and six concrete steps to get from “scattered systems” to “LLMs that can safely act on clinical data” >>> https://mev.com/blog/a-practical-guide-on-building-an-ai-ready-healthcare-data-architecture-in-6-steps

TL;DR

Most healthcare AI projects stall because the data layer is not ready:

Data is fragmented, inconsistent, and governed by tribal knowledge instead of explicit rules.
Modern healthcare platforms tend to converge on four core layers:
- FHIR operational layer (near real-time, clinical workflows)
- Warehouse / lakehouse (analytics and ML on de-identified data)
- MDM / hMDM (identity and golden records)
- API + access control (how apps and AI touch the data)

To make that stack AI-ready, you can think in six steps:

Use FHIR-first persistence as your canonical model.
Add fine-grained authorization, tighter than normal app RBAC.
Expose tools / function calls for LLMs instead of raw API access.
Add RAG so answers are grounded in patient data instead of model “intuition”.
ETL into a warehouse for cross-patient analytics and ML.
Bake in privacy and compliance controls (tokenization, consent, logging, zero-retention LLMs).

Why AI keeps failing in healthcare

The failure pattern is depressingly consistent:

Teams start with the model (“Let’s integrate GPT with our EHR!”).
A quick prototype kind of works on a sandbox dataset.
As soon as it touches live data and real permissions, everything falls apart.

After almost two decades building software for regulated industries, the pattern looks less like an AI problem and more like an architecture problem.

The blockers usually live here:

Missing or inconsistent fields → misclassified risk, wrong triage, “why is this answer so off?”
Duplicate patients and providers → broken histories, unsafe recommendations.
Conflicting business rules across systems → AI behavior changes depending on which source you hit.
Different source formats for the same concept → fragile ETL, surprise errors.

Healthcare is unforgiving. A tiny data glitch that would be harmless in an ecommerce app can translate to bad clinical guidance. That’s why we start with the data foundation instead of the model.

The four core layers of an AI-ready healthcare data stack

Most modern healthcare platforms we see end up with some version of these four layers:

FHIR-first operational data layer

Near real-time clinical data.
Resources like Patient, Observation, MedicationRequest, Encounter, Condition share common semantics.
Systems can plug into a known structure instead of one-off schemas.

Warehouse / lakehouse analytics layer

Snowflake, BigQuery, Databricks, etc.
ETL’d, standardized data for population health dashboards, longitudinal patient journeys, predictive models on de-identified data, сost and quality analytics

MDM / hMDM (Master Data Management)

Reconciles identities across patients, providers, payers, and plans.
Produces “golden records” so everything above isn’t built on a shaky identity layer.

API + access control layer

REST / GraphQL / FHIR APIs exposed in a predictable way.
Central place for permission logic, purpose-of-use checks
Masking and redaction, auditing and field-level access controls. This is also where your AI systems should enter the picture.

With that backdrop, let’s walk through how to assemble this into something an LLM can safely work with.

Step 1: Make FHIR your operational source of truth

If you want AI to navigate clinical data, it needs a consistent language. That’s what FHIR gives you.
Using FHIR as your canonical model:

Eliminates schema chaos: patients, encounters, observations, medications, conditions all use defined resource structures instead of ad hoc JSON.
Cuts a big chunk of one-off mapping work: many vendors already expose FHIR, or can be transformed into it with stable pipelines.
Makes interoperability default: hospitals, labs, pharmacies, payers all plug into the same structure.
Gives AI tools predictable outputs: a function like get_patient_observations() always returns a list of Observation resources, not “whatever that one integration happened to send”.
Keeps you adaptable: new modules or AI tools can connect without re-inventing your data model.

Quick reality check on standards

FHIR isn’t the only standard in healthcare, but it fills a specific niche:
HL7 v2 is great for older, message-based hospital workflows.
HL7 v3 / CDA is document-centric; good for clinical documents and sharing entire summaries.
openEHR focuses on long-term clinical modeling and robust repositories.
OMOP is fantastic for research and population analytics on de-identified data.
CDISC targets clinical research submission workflows.

We normally see FHIR working alongside these, not replacing them. FHIR deals with modern, API-driven, patient-centric workflows; the others handle archival, research, or regulatory use cases.

Example: FHIR-first patient engagement and compliance platform
One of our clients needed a platform to orchestrate complex treatment programs across patients, providers, pharmacies, and admins.

We could have cobbled together a bunch of custom tables. Instead, we built the whole thing on FHIR v4:

A HAPI FHIR server managed read/write operations.
External EHR and pharmacy systems synced through FHIR APIs.
Permissions were enforced at the resource level (RBAC + relationship-based rules + FHIR security mechanisms).

The impact:

No custom schemas for core clinical data → drastically less mapping.
Multiple apps (patient, provider, admin) could reuse the same data layer.
Access controls lined up naturally with FHIR resources.
When the client started adding AI features, the data model already made sense to an LLM.

Once FHIR is in place as your operational backbone, you can start thinking about who is allowed to see what.

Step 2: Layer in fine-grained authorization

Giving an AI assistant access to clinical data is very different from building a normal CRUD app.

You don’t just want “doctor” and “patient” roles. You need a permission model that accounts for:

User-specific access (patients only see their own records, physicians see active patients under their care).
Purpose of use (treatment vs research vs billing, etc.).
Contextual rules (time-bound access, “break-glass” emergency overrides).
Full audit trails (who accessed which fields, and why).

Imagine a patient asking: “What were my last blood test results?”

Behind the scenes:

The AI identifies and authenticates the user.
The authorization layer evaluates:Is this user the patient? Are they allowed to see Observation resources for themselves?
Only authorized FHIR resources are retrieved.
The AI summarizes those observations in natural language.

Tools we’ve seen work well in this space:

Permit.io, Permify for fine-grained access control with developer-friendly APIs.
OPA / ABAC-based custom solutions when you need very specific policy logic.

The key point: all AI queries should pass through this layer. The model never “free-browses” your datastore.

Step 3: Add a tools / function-calling layer for AI

Now that you have structured data and permissions, you need a safe way for AI to interact with it.

Modern LLMs (OpenAI, Claude, others) support function calling. Instead of asking the model to generate SQL or call arbitrary URLs, you expose a small toolkit of functions.

On your side, you already have:

FHIR server (operational data)
Warehouse / lakehouse (analytics)
MDM (identity)
APIs (access)

On top of that, define a narrow set of tools such as:

get_patient_observations(patient_id, category)
get_patient_conditions(patient_id)
get_patient_medications(patient_id)
search_encounters(patient_id, date_range)

The runtime flow looks like this:

User asks a question.
The LLM picks the appropriate tool from its toolbox.
The tool: Checks permissions using your auth layer, Queries FHIR / MDM / warehouse as needed, Returns structured data.
The LLM generates a natural-language answer based on that structured result.

The model never talks to your FHIR store or warehouse directly. It always goes through a thin, well-tested layer you control. That’s where you enforce input validation, limits, and permission checks.

Step 4: Use RAG so the model doesn’t have to guess

Even with function calling, a base LLM will happily improvise if it doesn’t see the data it needs. That’s how you get hallucinated medications and made-up guidelines.

Retrieval-Augmented Generation (RAG) gives you a way to ground answers in the right FHIR resources.

For example, a patient asks: “Why was I prescribed this medication?”

You can design a flow like this:

A tool retrieves:The relevant MedicationRequest, Any linked Condition, Recent Observation resources that influenced the decision.
Your RAG layer formats those resources into model-friendly context.
The LLM receives: The user’s question, Only the necessary pieces of structured data.
The model explains the reasoning, using the retrieved resources as the anchor.

This approach has a few important privacy implications:

Inject only what you need (fields required to answer the question).
Mask or tokenize identifiers (SSNs, exact addresses, etc.).
Log every retrieval (which data was passed to the model, for which user, and for what purpose).
Use zero-retention modes for LLM providers so PHI isn’t used for training.

**The result: **patients get explanations that trace back to specific data points, and you avoid “the model just made something up” scenarios.

Step 5: ETL into a warehouse for cross-patient analytics

So far we’ve focused on single-patient interactions. But you still need population-level insight:

Quality and performance metrics
Claims and cost analytics
Cohort discovery
Predictive models trained on de-identified data

That’s where a warehouse/lakehouse comes in.
Typical pattern:

ETL FHIR (and related) data into Snowflake / BigQuery / Databricks.
Normalize schemas, map codes, add quality checks.
De-identify or tokenize as required.
Expose curated datasets for analysts and ML.

Permissions change here:

Only a small, vetted group (data engineers, analysts, admins) can touch cross-patient datasets.
AI assistants that operate on a single patient by default should not see these populations unless explicitly allowed (e.g., a separate “analytics assistant” with stricter access).

Example: Snowflake-first claims intelligence platform
One client needed to infer a patient’s drug insurer at the pharmacy counter, even when the patient presented the wrong card.
Inputs:

Huge volumes of vendor-supplied pharmacy claims
Different schemas per vendor
Frequent format changes
Sparse documentation

We built a Snowflake-first architecture that:

Ingested claims directly via Snowflake Shares.
Normalized and validated incoming schemas.
Standardized codes and filled gaps through enrichment.
Applied tokenization for identity-related fields.
Ran a multi-stage MDM flow (deterministic → probabilistic → ML-assisted) to reconcile payer, PBM, and plan into a usable “golden” structure.

Outcomes:

A unified, reliable claims repository.
Low-latency API to infer coverage in real time.
Strong privacy posture (tokenization instead of raw PII).
A robust foundation for ML models to predict payer/plan.

This same warehouse layer becomes the backbone for dashboards, risk scores, and model training pipelines.

Step 6: Build privacy and compliance into the stack

You can have great architecture and clever LLM flows and still fail if regulators can’t trust the system.
For healthcare, we treat this as a first-class requirement, not an afterthought.

Key safeguards:

Data minimization. Tools and RAG inject only what’s needed to answer a question.
De-identification for ML. Use frameworks like Expert Determination or Safe Harbor when training models on historical data.
Tokenization and encryption. Especially for identities, genetic data, and sensitive observations.
Consent enforcement. AI must respect opt-outs and purpose limitations (e.g., treatment vs marketing vs research).
Comprehensive audit logging. Capture which user or agent accessed which resources and fields, for which purpose, and when.
Zero-retention LLM modes. Configure providers so PHI isn’t stored or used for model training.

When this layer is wired into the architecture, you can ship features that make regulators, compliance teams, and clinicians a lot more comfortable with AI-driven workflows.

Putting it together: what this architecture lets you do

With these layers in place, you unlock some useful properties:

AI assistants can act on behalf of users, using their exact permissions.
HIPAA / GDPR compliance is enforced technically, not just via policy documents.
AI queries are grounded in structured clinical data and fully auditable.
Behavior is explainable: every answer can be tied to specific FHIR resources and access decisions.
You can scale by adding new tools rather than redesigning the whole stack.
You often don’t need custom models to start; high-quality LLMs plus the right structure go a long way.

Final thoughts

In healthcare, AI success is dominated by architecture, not model choice.

If your data is fragmented, your permissions are fuzzy, and your access patterns aren’t controlled, no frontier-grade model will save you. If your data is well-structured, your permissions are explicit, and your access is mediated through narrow tools, even a “boring” LLM can safely add value.

At MEV, we’ve spent close to 20 years building systems in regulated environments (HIPAA, GDPR, SOC 2, ISO 27001, shifting AI guidance). From what we’ve seen, regulation is rarely the real blocker. Sloppy architecture is.

If you’re planning an AI initiative in healthcare, tell us what you’re trying to build. We’ll walk through what it will take in terms of time, scope, and budget, and whether AI is even the right tool for the problem you have.

Top comments (1)

rokoss21 • Dec 15 '25

This matches what we see repeatedly: healthcare AI doesn’t fail at inference, it fails at semantics and control.

When data models are inconsistent, identities are fuzzy, and permissions are implicit, the model is forced to improvise — which is unacceptable in a clinical context. Treating FHIR, access control, and tooling as first-class execution contracts is what turns AI from a demo into infrastructure.

The takeaway is simple: in regulated domains, AI reliability is an architectural property long before it’s a model property.