Every time you send a user query to an LLM API, you're potentially sending personal data to a third-party server. Under GDPR and most data protection laws, that's a data processing operation with legal requirements.
Here's the practical approach to handling PII in LLM pipelines.
The problem
User sends a message to your chatbot:
"Hi, I'm Ade Okonkwo, my email is ade@company.ng and my order #12345 hasn't arrived. My phone is 08034567890."
Your code sends this to OpenAI/Anthropic. Their servers — probably in the US — now have your customer's name, email, phone, and order number.
That's a cross-border data transfer of personal data to a third-party processor. You need:
- A Data Processing Agreement with the provider
- A lawful basis for the processing
- A privacy notice telling the user about it
- Ideally, audit logging of what was sent
The fix: detect and redact
Before sending to the API, scan for PII and optionally redact it:
from agent_shield import Shield
shield = Shield(redact_by_default=True)
# This redacts PII before it reaches the API
result = shield.call_openai(
client=openai_client,
messages=[{"role": "user", "content": user_message}],
purpose="customer_support",
user_id="user_123",
)
What the LLM actually receives:
"Hi, I'm Ade Okonkwo, my email is [EMAIL_REDACTED] and my order #12345 hasn't arrived. My phone is [PHONE_REDACTED]."
The email and phone never leave your infrastructure. The LLM can still answer the query. And you have an audit trail of exactly what was sent.
What agent-shield detects
12 PII types out of the box:
| Type | Example |
|---|---|
| john@example.com | |
| Nigerian phone | 08034567890 |
| UK phone | 01234 567890 |
| International phone | +234 803 456 7890 |
| Nigerian BVN | BVN: 12345678901 |
| UK NI number | AB123456C |
| Credit card | 4111-1111-1111-1111 |
| Date of birth | DOB: 15/03/1990 |
| IP address | 192.168.1.100 |
| IBAN | GB29NWBK60161331926819 |
| SSN (US) | 123-45-6789 |
All regex-based. Zero ML dependencies. Installs in 2 seconds.
The audit trail
Every call is logged with: timestamp, provider, model, input (original + redacted), output, PII detected, tokens, user ID, and purpose. The log uses a hash chain — if anyone modifies an entry, the chain breaks.
# Verify your audit trail is intact
valid, count = shield.verify_audit()
print(f"Chain: {'intact' if valid else 'TAMPERED'} ({count} entries)")
Generate compliance docs automatically
After running your agent with shield enabled:
# Auto-generate a DPIA skeleton
dpia = shield.generate_dpia(system_name="Customer Support Bot")
# Map where personal data flows
print(shield.generate_dataflow())
The DPIA generator produces a Markdown document covering data types processed, external providers, risk assessment, and recommended mitigations. It's about 60% of a complete DPIA — the rest needs human review.
GitHub: github.com/Thezenmonster/agent-shield
Full compliance guides:
Top comments (0)