Building an HCC Gap Analysis Pipeline (a developer's view of risk capture)

#vbc #hcc #backend #healthtech

If you write software for a Medicare Advantage plan, "HCC gap analysis" eventually lands on your desk as a data problem disguised as a clinical one. The clinical team says "we're leaving risk on the table." What they need from you is a pipeline that finds, ranks, and tracks the gaps. Here's how I think about building one.

The mental model

Start with definitions, because the acronyms compound fast:

HCC — Hierarchical Condition Category. The risk bucket a diagnosis rolls up into.
RAF — Risk Adjustment Factor. The score built from demographics plus HCC coefficients (CMS-HCC V28 is the current model).
A "gap" — a condition that is clinically supported somewhere in the data but is not captured as a coded, current-year HCC. Gap analysis is fundamentally a set-difference problem: suspected_hccs - documented_hccs, weighted by the RAF impact of each missing HCC. Step 1: Build the two sets Documented HCCs come from this year's confirmed claims/encounters, mapped through the current ICD-10 → HCC crosswalk. If you don't want to maintain your own crosswalk, the /getHCCCrosswalk sibling endpoint resolves ICD-10-CM codes to HCCs under a pinned model.

documented = {
    map_icd_to_hcc(dx, model="CMS-HCC-V28 Continuing Enrollee")
    for dx in current_year_diagnoses
    if dx.is_confirmed
}

Suspected HCCs come from weaker signals: prior-year HCCs that didn't recur, relevant labs, medications that imply a condition, and problem-list entries that never made it to a claim.

suspected = set()
suspected |= prior_year_hccs            # chronic conditions rarely resolve
suspected |= hccs_from_medications(rx)  # e.g., insulin -> diabetes family
suspected |= hccs_from_labs(labs)       # e.g., eGFR -> CKD staging

To turn a candidate condition set into a RAF impact, score it through the API. The endpoint is itemized, so you get a per-HCC coefficient back rather than a single opaque number:

curl -X POST https://restapi.npidataservices.com/raf/api/v1/getScore \
  -H "ApiKey: $RAF_API_KEY" \
  -H "Content-Type: application/json" \
  -H "accept: application/json" \
  -d '{
    "model": "CMS-HCC-V28 Continuing Enrollee",
    "factor": "Community NonDual Aged",
    "age": 66,
    "gender": "MALE",
    "HCC_Codes": ["E119", "C61", "N1832", "I509", "J449"]
  }'

Note the auth: a custom ApiKey: header, not Authorization: Bearer (a Bearer header returns 401). ICD-10-CM codes go in without dots (E11.9 -> E119).

Step 2: Compute the gap and weight it

A raw list of missing HCCs is noise. Engineers add value by ranking. The natural weight is the RAF coefficient — how much each closed gap would actually move the score.

gaps = suspected - documented
ranked = sorted(
    ({"hcc": h, "raf_delta": coefficient(h, "CMS-HCC-V28 Continuing Enrollee")} for h in gaps),
    key=lambda g: g["raf_delta"],
    reverse=True,
)

Now your clinical team gets a worklist sorted by impact instead of an undifferentiated dump. If you want the conceptual grounding for how those coefficients add up into a member's score, this RAF explainer is a solid reference — you can read more here.

Step 3: Close the loop with provenance

A gap you can't explain is a gap nobody will act on. For every suggested HCC, attach the evidence:

{
  "member_id": "SYNTH-00417",
  "suspected_hcc": "HCC38",
  "raf_delta": 0.31,
  "evidence": [
    {"type": "rx", "detail": "metformin (synthetic)"},
    {"type": "prior_hcc", "year": 2025}
  ],
  "status": "open"
}

Provenance is also what protects you later. A gap closed with documentation behind it survives a RADV (Risk Adjustment Data Validation) audit; a gap "closed" by guessing does not. Build the evidence trail from day one.

Step 4: Treat it as a recurring job, not a project

Gaps reopen. Members get new labs, conditions resolve, the model changes. Schedule the pipeline (monthly is common), snapshot the open/closed state, and track closure rate over time as your real KPI.

A few engineering gotchas

Use synthetic fixtures. Never test against live member data. Generate illustrative members that exercise each HCC family.
Pin the model version. A gap computed under V28 must be reproducible later; don't let the crosswalk float.
Idempotency. Re-running the pipeline shouldn't duplicate open gaps — key on (member, hcc, year). Wrapping up

HCC gap analysis isn't glamorous, but it's one of the highest-leverage pipelines you can build on the risk-adjustment side: it directly connects documentation quality to a plan's revenue accuracy and audit posture. If you want the broader, less code-heavy treatment of finding and closing these gaps, the full HCC gap analysis guide covers the program side that complements the pipeline above.

VBC Risk Analytics. Educational only — not coding, billing, or clinical advice; verify against the current CMS Rate Announcement. Synthetic data only.

DEV Community

Building an HCC Gap Analysis Pipeline (a developer's view of risk capture)

Top comments (0)