How to Handle PII in LLM API Calls (Practical Guide)

#python #security #ai #tutorial

Every time you send a user query to an LLM API, you're potentially sending personal data to a third-party server. Under GDPR and most data protection laws, that's a data processing operation with legal requirements.

Here's the practical approach to handling PII in LLM pipelines.

The problem

User sends a message to your chatbot:

"Hi, I'm Ade Okonkwo, my email is ade@company.ng and my order #12345 hasn't arrived. My phone is 08034567890."

Your code sends this to OpenAI/Anthropic. Their servers — probably in the US — now have your customer's name, email, phone, and order number.

That's a cross-border data transfer of personal data to a third-party processor. You need:

A Data Processing Agreement with the provider
A lawful basis for the processing
A privacy notice telling the user about it
Ideally, audit logging of what was sent

The fix: detect and redact

Before sending to the API, scan for PII and optionally redact it:

from agent_shield import Shield

shield = Shield(redact_by_default=True)

# This redacts PII before it reaches the API
result = shield.call_openai(
    client=openai_client,
    messages=[{"role": "user", "content": user_message}],
    purpose="customer_support",
    user_id="user_123",
)

What the LLM actually receives:

"Hi, I'm Ade Okonkwo, my email is [EMAIL_REDACTED] and my order #12345 hasn't arrived. My phone is [PHONE_REDACTED]."

The email and phone never leave your infrastructure. The LLM can still answer the query. And you have an audit trail of exactly what was sent.

What agent-shield detects

12 PII types out of the box:

Type	Example
Email	john@example.com
Nigerian phone	08034567890
UK phone	01234 567890
International phone	+234 803 456 7890
Nigerian BVN	BVN: 12345678901
UK NI number	AB123456C
Credit card	4111-1111-1111-1111
Date of birth	DOB: 15/03/1990
IP address	192.168.1.100
IBAN	GB29NWBK60161331926819
SSN (US)	123-45-6789

All regex-based. Zero ML dependencies. Installs in 2 seconds.

The audit trail

Every call is logged with: timestamp, provider, model, input (original + redacted), output, PII detected, tokens, user ID, and purpose. The log uses a hash chain — if anyone modifies an entry, the chain breaks.

# Verify your audit trail is intact
valid, count = shield.verify_audit()
print(f"Chain: {'intact' if valid else 'TAMPERED'} ({count} entries)")

Generate compliance docs automatically

After running your agent with shield enabled:

# Auto-generate a DPIA skeleton
dpia = shield.generate_dpia(system_name="Customer Support Bot")

# Map where personal data flows
print(shield.generate_dataflow())

The DPIA generator produces a Markdown document covering data types processed, external providers, risk assessment, and recommended mitigations. It's about 60% of a complete DPIA — the rest needs human review.

GitHub: github.com/Thezenmonster/agent-shield

Full compliance guides: