Here's a compliance gap that keeps coming up in healthcare AI audits: Business Associate Agreements cover data storage and transmission. They don't explicitly govern what's in an LLM's context window at inference time.
This sounds like a technicality. It isn't.
What the Gap Actually Is
When your ambient documentation pipeline sends a clinical transcript to a cloud LLM, your BAA with that cloud provider governs:
- How the API call payload is stored on their servers
- How data is transmitted between your system and theirs
- Data retention and deletion policies
What your BAA typically doesn't address:
- The identifiers actively present in the model's context window during inference
- What happens to those identifiers in the model's KV cache during a session
- How multi-turn conversations compound the exposure
- What intermediate services (orchestrators, proxies, logging layers) see in transit
The HHS Office for Civil Rights hasn't published specific guidance on inference-time PHI yet. But compliance teams at health systems are treating it as an active risk area — especially for AI features that went to production quickly.
The 18 Safe Harbor Identifiers Are Broader Than Most Teams Realize
Most engineers strip names and maybe dates when building clinical AI features. The HIPAA Safe Harbor method actually covers 18 identifier categories:
- Names
- Geographic subdivisions smaller than state (including ZIP codes)
- All elements of dates (except year) related to an individual
- Phone numbers
- Fax numbers
- Email addresses
- SSNs
- Medical record numbers
- Health plan beneficiary numbers
- Account numbers
- Certificate/license numbers
- Vehicle identifiers and serial numbers
- Device identifiers and serial numbers
- Web URLs
- IP addresses
- Biometric identifiers
- Full-face photos and comparable images
- Any unique identifying number, characteristic, or code
Categories 12-16 are the ones teams routinely miss. A clinical transcript that includes a patient's IP address (from a patient portal session), a device serial number (from an IoT monitoring device), or even an image URL containing a patient identifier — all of these are PHI under Safe Harbor.
The Technical Problem with Multi-Step LLM Pipelines
Modern clinical AI often chains multiple LLM calls:
Raw transcript → Summarization LLM → Structured extraction LLM → EHR write
Each step is a context window. If the raw transcript contains PHI and you only scrub at the final step, the intermediate context windows saw it.
If you use an orchestration framework (LangChain, LlamaIndex, CrewAI), there's usually a logging layer — and those logs often contain the full context payload.
If you use a cloud LLM with tool calling, each tool call payload may include context that accumulated over the session.
The Fix Is Straightforward
Strip identifiers before text enters any LLM context window. Not just the final one — every one.
import requests
def scrub_before_inference(clinical_text: str) -> str:
"""Strip all 18 HIPAA Safe Harbor identifiers before sending to any LLM."""
response = requests.post(
'https://the-service.live/scrub/api/scrub',
json={'text': clinical_text}
)
result = response.json()
return result['scrubbed_text']
# Use it at every LLM call boundary
clean_text = scrub_before_inference(raw_transcript)
llm_response = call_your_model(clean_text)
Input: "Patient Jane Doe, DOB 03/22/1975, MRN 4421891, presenting with chest pain"
Output: "Patient [NAME], DOB [DOB], MRN [MRN], presenting with chest pain"
Clinical meaning preserved. Identifiers replaced with typed tokens the model can still reason about.
What the Audit Trail Looks Like
The scrubber returns an audit log with each call:
{
"scrubbed_text": "Patient [NAME], DOB [DOB], MRN [MRN]",
"identifiers_removed": 3,
"safe_harbor_compliant": false,
"audit": [
{"identifier_type": "NAME_PAIR", "count": 1, "severity": "HIGH"},
{"identifier_type": "DOB", "count": 1, "severity": "HIGH"},
{"identifier_type": "MRN", "count": 1, "severity": "HIGH"}
]
}
This audit log is the paper trail that answers the auditor's question: "How do you know PHI isn't in your LLM context windows?"
Practical Integration Patterns
LangChain wrapper
from langchain.schema import BaseTransformChain
class PHIScrubChain(BaseTransformChain):
def transform(self, inputs, outputs):
inputs['text'] = scrub_before_inference(inputs['text'])
return inputs
# Add to start of any clinical chain
pipeline = PHIScrubChain() | your_existing_chain
FastAPI middleware
@app.middleware("http")
async def scrub_phi_middleware(request: Request, call_next):
if request.url.path.startswith('/llm/'):
body = await request.json()
if 'text' in body:
body['text'] = scrub_before_inference(body['text'])
return await call_next(request)
Where This Matters Most Right Now
Healthcare AI teams building in 2026 are running faster than compliance processes. The gap between "we have a BAA" and "we've addressed inference-time PHI exposure" is where audits are finding issues.
OCR guidance on this specific surface is expected in the next 12-18 months. Teams that address it now won't be scrambling retroactively.
Free to evaluate with actual clinical transcripts: https://the-service.live/scrub
The API accepts text, returns scrubbed text plus a full audit log. Free tier, no signup required.
Top comments (0)