Michael Nikitin

Posted on Mar 20 • Originally published at itirra.com

Building AI-Powered Healthcare Appeals: A Three-Stage Architecture Guide

#healthtech #memberappeal #ai

Most healthcare orgs only chase one of two appeal paths when claims get denied. The other path – member appeals – is a technical problem worth solving.

When a claim is denied, clinical staff typically file a provider appeal. But every patient also has a legal right to file a member appeal, which triggers a separate adjudication track with different review criteria. Most organizations ignore this path entirely, leaving recoverable revenue on the table.

Building member appeals at scale is an integration and automation problem: pull clinical data from the EHR, match it against payer-specific criteria, generate patient-facing documentation, and track outcomes. Here's a three-stage architecture for building that system.

The Integration Layer: FHIR + HL7

Before any AI logic, you need reliable data access. The clinical evidence supporting appeals (lab results, medication history, prior authorizations) lives in Epic, Oracle Health, MEDITECH, and similar systems.

You'll likely need both major interoperability standards:

FHIR for structured, on-demand data. RESTful APIs give you discrete clinical data when you need it. For appeals, the key resources are:

ExplanationOfBenefit and ClaimResponse for denial details
Patient and DocumentReference for supporting evidence
MedicationRequest and Observation for clinical context

HL7 v2 for real-time event triggers. ADT (Admit-Discharge-Transfer) and DFT messages let your system react the moment a denial posts. If you need to kick off an appeal workflow automatically when a claim status changes, this is likely your event source.

Map your data needs to specific FHIR resources and HL7 message types before writing integration code. Start with ExplanationOfBenefit and ClaimResponse, expand based on which denial categories you're targeting first.

Stage 1: LLM Wrapper

The simplest implementation: a general-purpose LLM behind an API, wrapped in a prompt layer.

The flow:

Pull denied claim + clinical notes via your EHR integration
Construct a prompt with denial reason, EOB data, appeal requirements
Send to LLM API
Return draft appeal letter for human review

This ships in weeks. Engineering effort is prompt tuning plus a thin integration layer. Costs are mostly API usage.

What you get: Working prototype, early data on which denial types respond well to AI-assisted appeals, something concrete to iterate on.

What you don't get:

No calibration to your specific payer mix or denial patterns
Hallucination risk (model may cite nonexistent policies)
No evaluation framework to measure output quality
No audit trail for compliance
Limited transparency into generation logic

Stage 1 is a starting point. Treat it as a data collection instrument – every draft generated, every human correction, every outcome tracked becomes training data for later stages.

Stage 2: Decomposed Architecture with RAG

The architectural shift: stop asking the LLM to do all the reasoning and decompose the problem instead.

LLM handles: Language tasks (summarization, draft generation)
Deterministic logic handles: Classification, routing, compliance checks

Stage 2 Pipeline

Classify – Rules engine categorizes denied claim by denial code + payer
Retrieve – RAG pipeline pulls payer-specific guidelines and historical overturn data from vector store
Generate – LLM drafts appeal grounded in retrieved context (not free-associating from training data)
Validate – Check output against known criteria before human review
Review – Human edits and submits

This decomposition gives you visibility. When an appeal fails, you can trace whether the issue was classification, retrieval, generation, or missing clinical data. You fix the specific component.

What you're building:

Vector database for payer guidelines and historical cases
Classification layer (denial code → workflow routing)
Prompt management system
Orchestration logic
Evaluation framework measuring output quality against real outcomes

What you can now measure:

Appeal success rates by denial category
Time to resolution
Dollars recovered through member appeals
Audit trails for compliance

Run Stage 1 through a pilot first. Collect error patterns. Use that data to prioritize which denial categories get Stage 2 treatment – highest revenue impact per engineering hour.

Stage 3: Fine-Tuned Domain Models

Stage 3 means training on your proprietary data: historical denial and appeal outcomes, payer behavior patterns, and documentation quality signals.

At this level, the system anticipates denials rather than reacting:

Flags at-risk claims based on historical patterns
Recommends preemptive documentation improvements
Routes appeals by predicted overturn likelihood
Surfaces systemic denial trends pointing to upstream issues

For member appeals specifically, a custom model can:

Predict which denied claims are the strongest candidates for patient-initiated appeals
Generate documentation calibrated to language-specific payers respond to
Learn from every outcome to improve predictions

Prerequisites (non-negotiable):

Mature FHIR/HL7 integrations
Clean, normalized historical data at scale
Robust testing harness from Stage 2 to validate the custom model actually outperforms a well-configured general LLM

Without the measurement infrastructure from Stage 2, you can't prove your expensive custom model beats what you already had. Build measurement first.

Sequencing the Build

Each stage generates the data that the next stage depends on.

Months 1-3: Integration + Stage 1

Connect to EHR via FHIR and HL7
Target highest-volume denial categories
Deploy LLM wrapper for member appeal drafting
Goal: generate production data on AI-assisted appeal performance

Months 3-9: Stage 2 on real data

Use Stage 1 error patterns to prioritize denial categories
Build classification, RAG pipelines, eval framework
Prove ROI, build the dataset Stage 3 will train on

Months 9-18: Stage 3 development

Fine-tune domain-specific models
Start where you have deepest data and clearest performance gap

Principles

Start with denial categories where member appeals have the highest dollar recovery potential
Treat Stage 1 as data collection, not just a productivity tool
Budget for integration as a first-class investment—the FHIR/HL7 plumbing becomes the foundation for everything

The critical dependency at each transition is data quality. Rushing the timeline without underlying data produces expensive models that don't outperform simpler approaches.

The three-stage framework applies beyond healthcare appeals – any domain where you're moving from generic LLM to production-grade, measurable AI follows a similar arc: wrapper → decomposed architecture with retrieval → domain-specific fine-tuning. The lesson is respecting the build order.

Top comments (1)

Some comments may only be visible to logged-in visitors. Sign in to view all comments.