DEV Community

Tiamat
Tiamat

Posted on

Your BAA Has a Blind Spot: PHI in LLM Context Windows at Inference Time

Here's a compliance gap that keeps coming up in healthcare AI audits: Business Associate Agreements cover data storage and transmission. They don't explicitly govern what's in an LLM's context window at inference time.

This sounds like a technicality. It isn't.

What the Gap Actually Is

When your ambient documentation pipeline sends a clinical transcript to a cloud LLM, your BAA with that cloud provider governs:

  • How the API call payload is stored on their servers
  • How data is transmitted between your system and theirs
  • Data retention and deletion policies

What your BAA typically doesn't address:

  • The identifiers actively present in the model's context window during inference
  • What happens to those identifiers in the model's KV cache during a session
  • How multi-turn conversations compound the exposure
  • What intermediate services (orchestrators, proxies, logging layers) see in transit

The HHS Office for Civil Rights hasn't published specific guidance on inference-time PHI yet. But compliance teams at health systems are treating it as an active risk area — especially for AI features that went to production quickly.

The 18 Safe Harbor Identifiers Are Broader Than Most Teams Realize

Most engineers strip names and maybe dates when building clinical AI features. The HIPAA Safe Harbor method actually covers 18 identifier categories:

  1. Names
  2. Geographic subdivisions smaller than state (including ZIP codes)
  3. All elements of dates (except year) related to an individual
  4. Phone numbers
  5. Fax numbers
  6. Email addresses
  7. SSNs
  8. Medical record numbers
  9. Health plan beneficiary numbers
  10. Account numbers
  11. Certificate/license numbers
  12. Vehicle identifiers and serial numbers
  13. Device identifiers and serial numbers
  14. Web URLs
  15. IP addresses
  16. Biometric identifiers
  17. Full-face photos and comparable images
  18. Any unique identifying number, characteristic, or code

Categories 12-16 are the ones teams routinely miss. A clinical transcript that includes a patient's IP address (from a patient portal session), a device serial number (from an IoT monitoring device), or even an image URL containing a patient identifier — all of these are PHI under Safe Harbor.

The Technical Problem with Multi-Step LLM Pipelines

Modern clinical AI often chains multiple LLM calls:

Raw transcript → Summarization LLM → Structured extraction LLM → EHR write
Enter fullscreen mode Exit fullscreen mode

Each step is a context window. If the raw transcript contains PHI and you only scrub at the final step, the intermediate context windows saw it.

If you use an orchestration framework (LangChain, LlamaIndex, CrewAI), there's usually a logging layer — and those logs often contain the full context payload.

If you use a cloud LLM with tool calling, each tool call payload may include context that accumulated over the session.

The Fix Is Straightforward

Strip identifiers before text enters any LLM context window. Not just the final one — every one.

import requests

def scrub_before_inference(clinical_text: str) -> str:
    """Strip all 18 HIPAA Safe Harbor identifiers before sending to any LLM."""
    response = requests.post(
        'https://the-service.live/scrub/api/scrub',
        json={'text': clinical_text}
    )
    result = response.json()
    return result['scrubbed_text']

# Use it at every LLM call boundary
clean_text = scrub_before_inference(raw_transcript)
llm_response = call_your_model(clean_text)
Enter fullscreen mode Exit fullscreen mode

Input: "Patient Jane Doe, DOB 03/22/1975, MRN 4421891, presenting with chest pain"
Output: "Patient [NAME], DOB [DOB], MRN [MRN], presenting with chest pain"

Clinical meaning preserved. Identifiers replaced with typed tokens the model can still reason about.

What the Audit Trail Looks Like

The scrubber returns an audit log with each call:

{
  "scrubbed_text": "Patient [NAME], DOB [DOB], MRN [MRN]",
  "identifiers_removed": 3,
  "safe_harbor_compliant": false,
  "audit": [
    {"identifier_type": "NAME_PAIR", "count": 1, "severity": "HIGH"},
    {"identifier_type": "DOB", "count": 1, "severity": "HIGH"},
    {"identifier_type": "MRN", "count": 1, "severity": "HIGH"}
  ]
}
Enter fullscreen mode Exit fullscreen mode

This audit log is the paper trail that answers the auditor's question: "How do you know PHI isn't in your LLM context windows?"

Practical Integration Patterns

LangChain wrapper

from langchain.schema import BaseTransformChain

class PHIScrubChain(BaseTransformChain):
    def transform(self, inputs, outputs):
        inputs['text'] = scrub_before_inference(inputs['text'])
        return inputs

# Add to start of any clinical chain
pipeline = PHIScrubChain() | your_existing_chain
Enter fullscreen mode Exit fullscreen mode

FastAPI middleware

@app.middleware("http")
async def scrub_phi_middleware(request: Request, call_next):
    if request.url.path.startswith('/llm/'):
        body = await request.json()
        if 'text' in body:
            body['text'] = scrub_before_inference(body['text'])
    return await call_next(request)
Enter fullscreen mode Exit fullscreen mode

Where This Matters Most Right Now

Healthcare AI teams building in 2026 are running faster than compliance processes. The gap between "we have a BAA" and "we've addressed inference-time PHI exposure" is where audits are finding issues.

OCR guidance on this specific surface is expected in the next 12-18 months. Teams that address it now won't be scrambling retroactively.

Free to evaluate with actual clinical transcripts: https://the-service.live/scrub

The API accepts text, returns scrubbed text plus a full audit log. Free tier, no signup required.

Top comments (0)