DEV Community

Michael Nikitin
Michael Nikitin

Posted on • Originally published at itirra.com

Building AI-Powered Healthcare Appeals: A Three-Stage Architecture Guide

Most healthcare orgs only chase one of two appeal paths when claims get denied. The other path – member appeals – is a technical problem worth solving.

When a claim is denied, clinical staff typically file a provider appeal. But every patient also has a legal right to file a member appeal, which triggers a separate adjudication track with different review criteria. Most organizations ignore this path entirely, leaving recoverable revenue on the table.

Building member appeals at scale is an integration and automation problem: pull clinical data from the EHR, match it against payer-specific criteria, generate patient-facing documentation, and track outcomes. Here's a three-stage architecture for building that system.

The Integration Layer: FHIR + HL7

Before any AI logic, you need reliable data access. The clinical evidence supporting appeals (lab results, medication history, prior authorizations) lives in Epic, Oracle Health, MEDITECH, and similar systems.

You'll likely need both major interoperability standards:

FHIR for structured, on-demand data. RESTful APIs give you discrete clinical data when you need it. For appeals, the key resources are:

  • ExplanationOfBenefit and ClaimResponse for denial details
  • Patient and DocumentReference for supporting evidence
  • MedicationRequest and Observation for clinical context

HL7 v2 for real-time event triggers. ADT (Admit-Discharge-Transfer) and DFT messages let your system react the moment a denial posts. If you need to kick off an appeal workflow automatically when a claim status changes, this is likely your event source.

Map your data needs to specific FHIR resources and HL7 message types before writing integration code. Start with ExplanationOfBenefit and ClaimResponse, expand based on which denial categories you're targeting first.

Stage 1: LLM Wrapper

The simplest implementation: a general-purpose LLM behind an API, wrapped in a prompt layer.

The flow:

  1. Pull denied claim + clinical notes via your EHR integration
  2. Construct a prompt with denial reason, EOB data, appeal requirements
  3. Send to LLM API
  4. Return draft appeal letter for human review

This ships in weeks. Engineering effort is prompt tuning plus a thin integration layer. Costs are mostly API usage.

What you get: Working prototype, early data on which denial types respond well to AI-assisted appeals, something concrete to iterate on.

What you don't get:

  • No calibration to your specific payer mix or denial patterns
  • Hallucination risk (model may cite nonexistent policies)
  • No evaluation framework to measure output quality
  • No audit trail for compliance
  • Limited transparency into generation logic

Stage 1 is a starting point. Treat it as a data collection instrument – every draft generated, every human correction, every outcome tracked becomes training data for later stages.

Stage 2: Decomposed Architecture with RAG

The architectural shift: stop asking the LLM to do all the reasoning and decompose the problem instead.

LLM handles: Language tasks (summarization, draft generation)
Deterministic logic handles: Classification, routing, compliance checks

Stage 2 Pipeline

  1. Classify – Rules engine categorizes denied claim by denial code + payer
  2. Retrieve – RAG pipeline pulls payer-specific guidelines and historical overturn data from vector store
  3. Generate – LLM drafts appeal grounded in retrieved context (not free-associating from training data)
  4. Validate – Check output against known criteria before human review
  5. Review – Human edits and submits

This decomposition gives you visibility. When an appeal fails, you can trace whether the issue was classification, retrieval, generation, or missing clinical data. You fix the specific component.

What you're building:

  • Vector database for payer guidelines and historical cases
  • Classification layer (denial code → workflow routing)
  • Prompt management system
  • Orchestration logic
  • Evaluation framework measuring output quality against real outcomes

What you can now measure:

  • Appeal success rates by denial category
  • Time to resolution
  • Dollars recovered through member appeals
  • Audit trails for compliance

Run Stage 1 through a pilot first. Collect error patterns. Use that data to prioritize which denial categories get Stage 2 treatment – highest revenue impact per engineering hour.

Stage 3: Fine-Tuned Domain Models

Stage 3 means training on your proprietary data: historical denial and appeal outcomes, payer behavior patterns, and documentation quality signals.

At this level, the system anticipates denials rather than reacting:

  • Flags at-risk claims based on historical patterns
  • Recommends preemptive documentation improvements
  • Routes appeals by predicted overturn likelihood
  • Surfaces systemic denial trends pointing to upstream issues

For member appeals specifically, a custom model can:

  • Predict which denied claims are the strongest candidates for patient-initiated appeals
  • Generate documentation calibrated to language-specific payers respond to
  • Learn from every outcome to improve predictions

Prerequisites (non-negotiable):

  • Mature FHIR/HL7 integrations
  • Clean, normalized historical data at scale
  • Robust testing harness from Stage 2 to validate the custom model actually outperforms a well-configured general LLM

Without the measurement infrastructure from Stage 2, you can't prove your expensive custom model beats what you already had. Build measurement first.

Sequencing the Build

Each stage generates the data that the next stage depends on.

Months 1-3: Integration + Stage 1

  • Connect to EHR via FHIR and HL7
  • Target highest-volume denial categories
  • Deploy LLM wrapper for member appeal drafting
  • Goal: generate production data on AI-assisted appeal performance

Months 3-9: Stage 2 on real data

  • Use Stage 1 error patterns to prioritize denial categories
  • Build classification, RAG pipelines, eval framework
  • Prove ROI, build the dataset Stage 3 will train on

Months 9-18: Stage 3 development

  • Fine-tune domain-specific models
  • Start where you have deepest data and clearest performance gap

Principles

  • Start with denial categories where member appeals have the highest dollar recovery potential
  • Treat Stage 1 as data collection, not just a productivity tool
  • Budget for integration as a first-class investment—the FHIR/HL7 plumbing becomes the foundation for everything

The critical dependency at each transition is data quality. Rushing the timeline without underlying data produces expensive models that don't outperform simpler approaches.

The three-stage framework applies beyond healthcare appeals – any domain where you're moving from generic LLM to production-grade, measurable AI follows a similar arc: wrapper → decomposed architecture with retrieval → domain-specific fine-tuning. The lesson is respecting the build order.

Top comments (0)