How I built a PII anonymization gateway for LLMs (and why every EMEA developer needs one)

#python #rag #privacy #opensource

The problem nobody talks about in EMEA AI development

Every tutorial about building LLM-powered apps assumes
the same thing: you can freely send your user data to
OpenAI or Anthropic.

In EMEA, that assumption is wrong.

Tunisia is currently examining a 123-article organic law
regulating AI and automated decision-making.
France enforces GDPR strictly — and LLM prompts containing
personal data count as data processing.
Morocco is formalizing its digital governance framework.

If you're building AI features for EMEA customers and
forwarding their data to US inference providers, you have
a compliance problem you may not know about yet.

What I built

SovereignGuard is an open source AI privacy gateway.

It sits between your application and any LLM API.
It intercepts outbound prompts, strips PII before the
request leaves your server, and restores original values
locally after the response comes back.

The LLM never sees real customer data.
Your app never changes beyond one line.

How it works — the full mechanism

Here's the complete request lifecycle:
Your App
|
| sends prompt with real customer data
v
SovereignGuard (running on YOUR server)
|
|-- creates session ID
|-- runs PII detection (fast regex + heavy recognizer)
|-- replaces PII with reversible SG tokens
|-- stores token→original mapping locally
v
LLM Provider (OpenAI / Anthropic / Mistral / etc.)
|
| receives only tokenized text
| returns tokenized response
v
SovereignGuard
|
|-- detects SG tokens in response
|-- restores original values from local mapping
|-- destroys or expires session mapping
v
Your App
|
receives clean, usable response

The token format

Tokens look like this:
{{SG_PERSON_NAME_a3f9b2}}
{{SG_TN_PHONE_c4d5e6}}
{{SG_TN_NATIONAL_ID_f7e3b1}}
{{SG_EMAIL_b2c3d4}}
Format: {{SG_ENTITY_TYPE_randomhex}}

The hex suffix is random per session.
Even if someone intercepts the tokenized prompt,
they cannot reverse the tokens without the local mapping.

Live proof

Input from your app:
"Contact Baha at +216 XX XXX XXX, CIN 12345678"
What the LLM actually receives:
"Contact {{SG_PERSON_a3f9b2}} at {{SG_TN_PHONE_c4d5e6}},
{{SG_TN_NATIONAL_ID_f7e3b1}}"
What your app gets back:
"Contact Baha at +216 XX XXX XXX, CIN 12345678"

restoration_completeness = 1.0
tokens_restored = 3
tokens_not_found = 0
Tested live against DeepSeek API. Works.

The EMEA gap

Every existing PII tool was built for US or Western EU data.

Here's what they miss:

Tunisia:

CIN: exactly 8 digits (e.g., 12345678)
Phone: +216 followed by 8 digits
Matricule Fiscale: 7 digits + letter + 3 digits + 3 digits

Morocco:

CIN: 1–2 letters + 5–6 digits (e.g., AB123456)
Phone: +212 followed by 9 digits
ICE: exactly 15 digits

France:

NIR (social security): 13 digits + 2-digit key
SIRET: 14 digits
Phone: +33 followed by 9 digits

SovereignGuard is the only open source gateway
with native recognizers for these patterns.

Integration — one line

If you're using the OpenAI Python SDK:


python
from openai import OpenAI

# Before
client = OpenAI(api_key="sk-...")

# After — that's it
client = OpenAI(
    api_key="sg-your-gateway-key",
    base_url="http://localhost:8000/v1"
)


Your entire application stays the same.
SovereignGuard handles everything transparently.
Running it
Docker (recommended):
git clone https://github.com/bahaeddinmselmi/sovereignguard
cd sovereignguard
cp .env.example .env
# edit .env with your provider API key
docker compose up --build
Smoke test:
curl http://localhost:8000/health
# {"status":"healthy","gateway":"SovereignGuard","version":"0.2.0"}
What's next
Countries I need recognizers for:
🇩🇿 Algeria
🇩🇪 Germany
🇳🇬 Nigeria
🇸🇦 Saudi Arabia
🇦🇪 UAE
🇸🇳 Senegal
Adding a country recognizer takes about 30 minutes.
See [docs/adding-recognizers.md] for the guide.
GitHub: https://github.com/bahaeddinmselmi/sovereignguard
If you're building AI features for EMEA customers,
this layer belongs in your stack.
Star it, break it, contribute to it.