Automating RAF Scoring in Real Time: An Architecture Walkthrough

#healthtech #api #python #vbc

If you've ever built a batch RAF job, you know the awkward truth: by the time the score lands, the encounter that produced it is long over. The interesting engineering challenge is moving that calculation from a nightly batch to a real-time, event-driven service. Here's how I think about the architecture.

What we're computing

The Risk Adjustment Factor (RAF) is built from demographic factors plus clinical conditions expressed as Hierarchical Condition Categories (HCCs), each weighted by a coefficient under CMS-HCC V28. The math itself is simple addition. The hard part is doing it as data arrives, deterministically, with an audit trail.

For the conceptual background on why real-time scoring improves accuracy, this writeup on how RAF score automation works is a good companion read.

Event-driven, not batch

The shift is from "scan everything nightly" to "recompute the affected member when their data changes."

new_diagnosis_event ──> map ICD-10 -> HCC ──> recompute member RAF ──> emit score event

A new confirmed diagnosis, a corrected code, or a model-year change triggers a recompute of just that member, not the whole population.

import httpx, os


MODEL = "CMS-HCC-V28 Continuing Enrollee"   # pinned, never implicit


def on_diagnosis_event(event):
    member = load_member(event.member_id)
    resp = httpx.post(
        "https://restapi.npidataservices.com/raf/api/v1/getScore",
        headers={
            "ApiKey": os.environ["RAF_API_KEY"],   # custom header, NOT Bearer
            "Content-Type": "application/json",
            "accept": "application/json",
        },
        json={
            "model": MODEL,
            "factor": "Community NonDual Aged",
            "age": member.age,
            "gender": member.gender,               # "MALE" | "FEMALE"
            "HCC_Codes": member.icd10_no_dots,      # e.g. ["E119", "I509"]
        },
        timeout=5.0,
    )
    resp.raise_for_status()
    score = resp.json()                            # itemized, additive components
    emit("raf.updated", {
        "member_id": member.id,
        "raf": score["score"]["Total"]["Grand Total"]["RAF_Score"],
        "breakdown": score["score"],               # Demographic + Diagnosis + interactions
        "model": MODEL,
    })

Determinism is the whole ballgame

Real-time scoring is only useful if it's reproducible. Two requirements:

Pin the model version. A score computed today must be reproducible later. Pass model explicitly (e.g. "CMS-HCC-V28 Continuing Enrollee"); never let the crosswalk float implicitly.
Itemize the output. A bare RAF is undebuggable and indefensible. The getScore response is already itemized into additive components — Demographic, per-HCC Diagnosis, Disease Interaction, and a Total block — each carrying both a RAF_Score (coefficient) and an MA_Payment (dollars). That structure is exactly what you need when a RADV (Risk Adjustment Data Validation) audit asks how a number was derived.

{
  "api_usage_log_id": 439229,
  "score": {
    "Demographic": { "Age and Gender": { "MA_Payment": 3453.58, "RAF_Score": 0.332 } },
    "Diagnosis": {
      "HCC 226": { "MA_Payment": 3744.84, "RAF_Score": 0.36 },
      "HCC 38":  { "MA_Payment": 1726.79, "RAF_Score": 0.166 },
      "HCC 328": { "MA_Payment": "1321.10", "RAF_Score": 0.127 },
      "HCC Count": { "Count": 5, "MA_Payment": 520.12, "RAF_Score": 0.05 }
    },
    "Disease Interaction": {
      "DIABETES_HF": { "MA_Payment": 1165.06, "RAF_Score": 0.112 },
      "HF_KIDNEY":   { "MA_Payment": 1830.81, "RAF_Score": 0.176 }
    },
    "Total": {
      "Grand Total": { "MA_Payment": 19826.87, "RAF_Score": 1.906 },
      "MA_Adjusted": { "MA_Payment": 17485.55, "RAF_Score": 1.681 },
      "Normalized":  { "MA_Payment": 18581.88, "RAF_Score": 1.786 }
    }
  },
  "score_cnt": 9,
  "status": "success"
}

(Synthetic.) One parsing gotcha: some fields — like HCC 328's MA_Payment above — come back as a quoted string ("1321.10"), so coerce to a number defensively rather than assuming JSON floats.

Idempotency and ordering

Events arrive out of order and get redelivered. Two defenses:

Idempotent recompute. Recomputing from the member's current state (not by incrementally mutating a score) means a duplicate event is harmless.
Version your member state. Tag each recompute with the input version so a late-arriving stale event can be safely ignored.

Batch still has a place: backfill and roster-wide reruns

Real-time handles the steady-state stream, but you still need a batch path for the initial population load and for re-scoring everyone after a model-year change. That runs against a separate batch API as a 3-step job (base https://www.vbcriskanalytics.com/raf-batch-api, auth ApiKey: <key> plus an empty X-CSRF-TOKEN: header on every call):

# 1) submit a CSV of (member, diagnosis) rows -> returns a job id
curl -X POST https://www.vbcriskanalytics.com/raf-batch-api/getPreProspectScore \
  -H "ApiKey: $RAF_BATCH_API_KEY" \
  -H "X-CSRF-TOKEN: " \
  -F "risk_model=CMS-HCC-V28 Continuing Enrollee" \
  -F "risk_factor=Community NonDual Aged" \
  -F "file=@members.csv"
# -> {"code":201,"raf_batch_id":3400,"status":"Queued","check_status_url":"..."}


# 2) poll status: Queued -> Running -> Completed (Completed returns a download_url)
curl https://www.vbcriskanalytics.com/raf-batch-api/check-status/3400 \
  -H "ApiKey: $RAF_BATCH_API_KEY" -H "X-CSRF-TOKEN: "


# 3) download the result (short-lived ~120s signed S3 .zip wrapping an .xlsx)
curl -L https://www.vbcriskanalytics.com/raf-batch-api/download/3400 \
  -H "ApiKey: $RAF_BATCH_API_KEY" -H "X-CSRF-TOKEN: " -o results.zip

The CSV is one row per (member, diagnosis) with columns ID,Gender,Age,ICD-10 CM Code,Flag, where Flag (the Pre-Prospective Flag) is Last_Year or Current_Year. Treat the download URL as ephemeral — it expires in about two minutes, so fetch it immediately once status is Completed.

Don't let automation launder bad data

This is the failure mode worth stating plainly: automation amplifies whatever logic you give it. If a mapping is wrong or a diagnosis is unsupported, real-time scoring just produces wrong numbers faster and at scale. Build the accuracy into the rules — unsupported-HCC checks, specificity flags — upstream of the scorer, not as an afterthought.

Testing

Synthetic fixtures only. Generate illustrative members that exercise each HCC family and interaction term. Never test against live records.
Golden tests. Pin known inputs to known outputs per model version, so a coefficient change can't silently alter historical scores.

The payoff

An event-driven, deterministic RAF service puts an accurate, explainable score where it can influence care and documentation in the moment — and makes audit defense a query rather than a scramble. The full conceptual treatment of accuracy gains lives in the companion article above; this post is the architecture behind it.

VBC Risk Analytics. Educational only — not coding, billing, or clinical advice; verify against the current CMS Rate Announcement. Synthetic data only.