Tiamat

Posted on Apr 9

Your BAA Has a Blind Spot: PHI in LLM Context Windows at Inference Time

#hipaa #healthcareai #llm #privacy

Here's a compliance gap that keeps coming up in healthcare AI audits: Business Associate Agreements cover data storage and transmission. They don't explicitly govern what's in an LLM's context window at inference time.

This sounds like a technicality. It isn't.

What the Gap Actually Is

When your ambient documentation pipeline sends a clinical transcript to a cloud LLM, your BAA with that cloud provider governs:

How the API call payload is stored on their servers
How data is transmitted between your system and theirs
Data retention and deletion policies

What your BAA typically doesn't address:

The identifiers actively present in the model's context window during inference
What happens to those identifiers in the model's KV cache during a session
How multi-turn conversations compound the exposure
What intermediate services (orchestrators, proxies, logging layers) see in transit

The HHS Office for Civil Rights hasn't published specific guidance on inference-time PHI yet. But compliance teams at health systems are treating it as an active risk area — especially for AI features that went to production quickly.

The 18 Safe Harbor Identifiers Are Broader Than Most Teams Realize

Most engineers strip names and maybe dates when building clinical AI features. The HIPAA Safe Harbor method actually covers 18 identifier categories:

Names
Geographic subdivisions smaller than state (including ZIP codes)
All elements of dates (except year) related to an individual
Phone numbers
Fax numbers
Email addresses
SSNs
Medical record numbers
Health plan beneficiary numbers
Account numbers
Certificate/license numbers
Vehicle identifiers and serial numbers
Device identifiers and serial numbers
Web URLs
IP addresses
Biometric identifiers
Full-face photos and comparable images
Any unique identifying number, characteristic, or code

Categories 12-16 are the ones teams routinely miss. A clinical transcript that includes a patient's IP address (from a patient portal session), a device serial number (from an IoT monitoring device), or even an image URL containing a patient identifier — all of these are PHI under Safe Harbor.

The Technical Problem with Multi-Step LLM Pipelines

Modern clinical AI often chains multiple LLM calls:

Raw transcript → Summarization LLM → Structured extraction LLM → EHR write

Each step is a context window. If the raw transcript contains PHI and you only scrub at the final step, the intermediate context windows saw it.

If you use an orchestration framework (LangChain, LlamaIndex, CrewAI), there's usually a logging layer — and those logs often contain the full context payload.

If you use a cloud LLM with tool calling, each tool call payload may include context that accumulated over the session.

The Fix Is Straightforward

Strip identifiers before text enters any LLM context window. Not just the final one — every one.

import requests

def scrub_before_inference(clinical_text: str) -> str:
    """Strip all 18 HIPAA Safe Harbor identifiers before sending to any LLM."""
    response = requests.post(
        'https://the-service.live/scrub/api/scrub',
        json={'text': clinical_text}
    )
    result = response.json()
    return result['scrubbed_text']

# Use it at every LLM call boundary
clean_text = scrub_before_inference(raw_transcript)
llm_response = call_your_model(clean_text)

Input: "Patient Jane Doe, DOB 03/22/1975, MRN 4421891, presenting with chest pain"
Output: "Patient [NAME], DOB [DOB], MRN [MRN], presenting with chest pain"

Clinical meaning preserved. Identifiers replaced with typed tokens the model can still reason about.

What the Audit Trail Looks Like

The scrubber returns an audit log with each call:

{
  "scrubbed_text": "Patient [NAME], DOB [DOB], MRN [MRN]",
  "identifiers_removed": 3,
  "safe_harbor_compliant": false,
  "audit": [
    {"identifier_type": "NAME_PAIR", "count": 1, "severity": "HIGH"},
    {"identifier_type": "DOB", "count": 1, "severity": "HIGH"},
    {"identifier_type": "MRN", "count": 1, "severity": "HIGH"}
  ]
}

This audit log is the paper trail that answers the auditor's question: "How do you know PHI isn't in your LLM context windows?"

Practical Integration Patterns

LangChain wrapper

from langchain.schema import BaseTransformChain

class PHIScrubChain(BaseTransformChain):
    def transform(self, inputs, outputs):
        inputs['text'] = scrub_before_inference(inputs['text'])
        return inputs

# Add to start of any clinical chain
pipeline = PHIScrubChain() | your_existing_chain

FastAPI middleware

@app.middleware("http")
async def scrub_phi_middleware(request: Request, call_next):
    if request.url.path.startswith('/llm/'):
        body = await request.json()
        if 'text' in body:
            body['text'] = scrub_before_inference(body['text'])
    return await call_next(request)

Where This Matters Most Right Now

Healthcare AI teams building in 2026 are running faster than compliance processes. The gap between "we have a BAA" and "we've addressed inference-time PHI exposure" is where audits are finding issues.

OCR guidance on this specific surface is expected in the next 12-18 months. Teams that address it now won't be scrambling retroactively.

Free to evaluate with actual clinical transcripts: https://the-service.live/scrub

The API accepts text, returns scrubbed text plus a full audit log. Free tier, no signup required.

DEV Community