<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Baha Eddin Mselmi</title>
    <description>The latest articles on DEV Community by Baha Eddin Mselmi (@bahaeddinmselmi).</description>
    <link>https://dev.to/bahaeddinmselmi</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3617932%2F81e09b8e-4a38-42ef-8a3d-2e5b9085ab7d.jpeg</url>
      <title>DEV Community: Baha Eddin Mselmi</title>
      <link>https://dev.to/bahaeddinmselmi</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/bahaeddinmselmi"/>
    <language>en</language>
    <item>
      <title>How I built a PII anonymization gateway for LLMs (and why every EMEA developer needs one)</title>
      <dc:creator>Baha Eddin Mselmi</dc:creator>
      <pubDate>Fri, 13 Mar 2026 20:50:08 +0000</pubDate>
      <link>https://dev.to/bahaeddinmselmi/how-i-built-a-pii-anonymization-gateway-for-llms-and-why-every-emea-developer-needs-one-1d9h</link>
      <guid>https://dev.to/bahaeddinmselmi/how-i-built-a-pii-anonymization-gateway-for-llms-and-why-every-emea-developer-needs-one-1d9h</guid>
      <description>&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ft73o6wivmbpdmgaj7ay5.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ft73o6wivmbpdmgaj7ay5.jpg" alt=" " width="800" height="716"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  The problem nobody talks about in EMEA AI development
&lt;/h2&gt;

&lt;p&gt;Every tutorial about building LLM-powered apps assumes &lt;br&gt;
the same thing: you can freely send your user data to &lt;br&gt;
OpenAI or Anthropic.&lt;/p&gt;

&lt;p&gt;In EMEA, that assumption is wrong.&lt;/p&gt;

&lt;p&gt;Tunisia is currently examining a 123-article organic law &lt;br&gt;
regulating AI and automated decision-making.&lt;br&gt;
France enforces GDPR strictly — and LLM prompts containing &lt;br&gt;
personal data count as data processing.&lt;br&gt;
Morocco is formalizing its digital governance framework.&lt;/p&gt;

&lt;p&gt;If you're building AI features for EMEA customers and &lt;br&gt;
forwarding their data to US inference providers, you have &lt;br&gt;
a compliance problem you may not know about yet.&lt;/p&gt;

&lt;h2&gt;
  
  
  What I built
&lt;/h2&gt;

&lt;p&gt;SovereignGuard is an open source AI privacy gateway.&lt;/p&gt;

&lt;p&gt;It sits between your application and any LLM API. &lt;br&gt;
It intercepts outbound prompts, strips PII before the &lt;br&gt;
request leaves your server, and restores original values &lt;br&gt;
locally after the response comes back.&lt;/p&gt;

&lt;p&gt;The LLM never sees real customer data.&lt;br&gt;
Your app never changes beyond one line.&lt;/p&gt;

&lt;h2&gt;
  
  
  How it works — the full mechanism
&lt;/h2&gt;

&lt;p&gt;Here's the complete request lifecycle:&lt;br&gt;
Your App&lt;br&gt;
|&lt;br&gt;
| sends prompt with real customer data&lt;br&gt;
v&lt;br&gt;
SovereignGuard (running on YOUR server)&lt;br&gt;
|&lt;br&gt;
|-- creates session ID&lt;br&gt;
|-- runs PII detection (fast regex + heavy recognizer)&lt;br&gt;
|-- replaces PII with reversible SG tokens&lt;br&gt;
|-- stores token→original mapping locally&lt;br&gt;
v&lt;br&gt;
LLM Provider (OpenAI / Anthropic / Mistral / etc.)&lt;br&gt;
|&lt;br&gt;
| receives only tokenized text&lt;br&gt;
| returns tokenized response&lt;br&gt;
v&lt;br&gt;
SovereignGuard&lt;br&gt;
|&lt;br&gt;
|-- detects SG tokens in response&lt;br&gt;
|-- restores original values from local mapping&lt;br&gt;
|-- destroys or expires session mapping&lt;br&gt;
v&lt;br&gt;
Your App&lt;br&gt;
|&lt;br&gt;
receives clean, usable response&lt;/p&gt;

&lt;h2&gt;
  
  
  The token format
&lt;/h2&gt;

&lt;p&gt;Tokens look like this:&lt;br&gt;
{{SG_PERSON_NAME_a3f9b2}}&lt;br&gt;
{{SG_TN_PHONE_c4d5e6}}&lt;br&gt;
{{SG_TN_NATIONAL_ID_f7e3b1}}&lt;br&gt;
{{SG_EMAIL_b2c3d4}}&lt;br&gt;
Format: &lt;code&gt;{{SG_ENTITY_TYPE_randomhex}}&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;The hex suffix is random per session. &lt;br&gt;
Even if someone intercepts the tokenized prompt, &lt;br&gt;
they cannot reverse the tokens without the local mapping.&lt;/p&gt;

&lt;h2&gt;
  
  
  Live proof
&lt;/h2&gt;

&lt;p&gt;Input from your app:&lt;br&gt;
"Contact Baha at +216 XX XXX XXX, CIN 12345678"&lt;br&gt;
What the LLM actually receives:&lt;br&gt;
"Contact {{SG_PERSON_a3f9b2}} at {{SG_TN_PHONE_c4d5e6}},&lt;br&gt;
{{SG_TN_NATIONAL_ID_f7e3b1}}"&lt;br&gt;
What your app gets back:&lt;br&gt;
"Contact Baha at +216 XX XXX XXX, CIN 12345678"&lt;/p&gt;

&lt;p&gt;restoration_completeness = 1.0&lt;br&gt;
tokens_restored = 3&lt;br&gt;
tokens_not_found = 0&lt;br&gt;
Tested live against DeepSeek API. Works.&lt;/p&gt;

&lt;h2&gt;
  
  
  The EMEA gap
&lt;/h2&gt;

&lt;p&gt;Every existing PII tool was built for US or Western EU data.&lt;/p&gt;

&lt;p&gt;Here's what they miss:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Tunisia:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;CIN: exactly 8 digits (e.g., 12345678)&lt;/li&gt;
&lt;li&gt;Phone: +216 followed by 8 digits&lt;/li&gt;
&lt;li&gt;Matricule Fiscale: 7 digits + letter + 3 digits + 3 digits&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Morocco:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;CIN: 1–2 letters + 5–6 digits (e.g., AB123456)&lt;/li&gt;
&lt;li&gt;Phone: +212 followed by 9 digits&lt;/li&gt;
&lt;li&gt;ICE: exactly 15 digits&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;France:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;NIR (social security): 13 digits + 2-digit key&lt;/li&gt;
&lt;li&gt;SIRET: 14 digits&lt;/li&gt;
&lt;li&gt;Phone: +33 followed by 9 digits&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;SovereignGuard is the only open source gateway &lt;br&gt;
with native recognizers for these patterns.&lt;/p&gt;

&lt;h2&gt;
  
  
  Integration — one line
&lt;/h2&gt;

&lt;p&gt;If you're using the OpenAI Python SDK:&lt;/p&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;
python
from openai import OpenAI

# Before
client = OpenAI(api_key="sk-...")

# After — that's it
client = OpenAI(
    api_key="sg-your-gateway-key",
    base_url="http://localhost:8000/v1"
)


Your entire application stays the same.
SovereignGuard handles everything transparently.
Running it
Docker (recommended):
git clone https://github.com/bahaeddinmselmi/sovereignguard
cd sovereignguard
cp .env.example .env
# edit .env with your provider API key
docker compose up --build
Smoke test:
curl http://localhost:8000/health
# {"status":"healthy","gateway":"SovereignGuard","version":"0.2.0"}
What's next
Countries I need recognizers for:
🇩🇿 Algeria
🇩🇪 Germany
🇳🇬 Nigeria
🇸🇦 Saudi Arabia
🇦🇪 UAE
🇸🇳 Senegal
Adding a country recognizer takes about 30 minutes.
See [docs/adding-recognizers.md] for the guide.
GitHub: https://github.com/bahaeddinmselmi/sovereignguard
If you're building AI features for EMEA customers,
this layer belongs in your stack.
Star it, break it, contribute to it.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

</description>
      <category>python</category>
      <category>rag</category>
      <category>privacy</category>
      <category>opensource</category>
    </item>
  </channel>
</rss>
