This is a submission for the Gemma 4 Challenge: Build with Gemma 4
Every time you paste sensitive data, legal documents, or personal details into ChatGPT or Claude, that data leaves your device. PromptGuard intercepts it first β and Gemma 4 running locally does the redaction before the prompt ever touches a cloud server.
Table of Contents
- The Problem Nobody Talks About
- What I Built
- Why Gemma 4 Is the Right Model for This
- System Architecture
- How It Works: The Full Pipeline
- The Chrome Extension
- The Local Backend
- Real-World Demo: Legal Document Workflow
- What Gets Redacted
- Limitations and Honest Caveats
- What's Next
- Key Takeaways
The Problem Nobody Talks About
Every week, professionals across healthcare, law, finance, and government are doing something they probably shouldn't: pasting sensitive documents directly into public AI interfaces.
A lawyer drafting a brief pastes a client's NIC number, phone, and case details into ChatGPT to get a summary. A doctor asks Claude to help structure a patient report β with the patient's full name and health data in the prompt. A developer pastes a production database dump to debug a query. A researcher uploads a compliance document containing employee records.
Each of those prompts is transmitted to a cloud server, processed, potentially logged for safety review, and retained under terms of service that most users haven't read carefully.
This isn't a hypothetical risk. Sri Lanka's Personal Data Protection Act No. 9 of 2022 (PDPA) imposes legal obligations on controllers who process personal data. Section 10 requires appropriate technical and organizational measures. Sections 13β18 guarantee data subject rights that can be violated by unauthorized disclosure. Section 38 sets penalties up to Rs. 10 million per non-compliance.
GDPR, UAE PDPL, and equivalent frameworks carry similar β or higher β obligations.
The problem: there is no guard between the user's clipboard and the AI's cloud API.
PromptGuard is that guard.
What I Built
PromptGuard is a two-component, local-first privacy firewall:
1. promptguard/ β Local Python backend
A FastAPI server running on localhost:8000 that receives raw prompts, runs a two-stage redaction pipeline using Gemma 4 via Ollama, and returns a sanitized version. Zero network calls. Everything on-device.
2. promptguard-extension/ β Chrome Extension (Manifest V3)
Injects a "Sanitize Prompt" button into ChatGPT and Claude.ai. When clicked, it intercepts the current prompt, sends it to the local backend for sanitization, replaces the prompt in the input box with the cleaned version, and only then allows the user to submit.
User types prompt with PII
β
[Chrome Extension intercepts]
β
POST to localhost:8000/scan
β
[Regex pre-redaction: NIC, email, phone]
β
[Gemma 4:e4b on-device LLM redaction]
β
Safe prompt returned
β
Input box updated with sanitized version
β
User submits to ChatGPT / Claude β clean
The entire redaction process happens on your machine. The cloud AI never sees the original.
Why Gemma 4 Is the Right Model for This
This is the question the judges will ask β so I want to answer it directly and honestly.
Why not GPT-4o or Claude for redaction?
Sending sensitive data to a cloud API to redact sensitive data before sending to a cloud API is circular and defeats the purpose entirely. The solution has to be local.
Why not a simple regex approach?
Regex handles known patterns β NIC numbers (\d{9}[VvXx]), emails, phone numbers. But PII is contextual:
- "Call me at the usual number" β no regex catches this
- "The patient presented at 14:30, John Smith, age 34" β name + age in natural language
- "My CNIC is written on the form I mentioned earlier" β reference without the number itself
- "Send it to the Gmail I use for work" β implied email without the address
You need a model that understands intent and context, not just patterns. Gemma 4 provides that understanding at a scale that runs locally.
Why Gemma 4 specifically β and why the e4b variant?
Gemma 4 model family:
2B / 4B β ultra-mobile, browser, edge (Pixel, Raspberry Pi)
27B β server-grade, high accuracy
e4b (MoE) β efficient inference, advanced reasoning, local deployment
gemma4:e4b is the Mixture-of-Experts variant β it activates only the expert subnetworks relevant to the current task. For a redaction task that requires:
- Named entity recognition in natural language
- Context-aware sensitivity detection
- Understanding of legal and medical terminology
- Preservation of semantic meaning after redaction
The MoE architecture gives you reasoning quality close to the 27B model at a fraction of the inference cost. It runs comfortably on a machine with 16GB RAM via Ollama. The 2B/4B models were too aggressive β they redacted useful context along with PII. The 27B model was too slow for real-time prompt interception. e4b was the right balance.
This wasn't a default choice. I tested all three and e4b was the only one that preserved readability while catching contextual PII that regex missed.
System Architecture
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β USER'S MACHINE β
β β
β ββββββββββββββββββββ βββββββββββββββββββββββββ β
β β Chrome Browser β β PromptGuard Backend β β
β β β β (FastAPI :8000) β β
β β ββββββββββββββ β β β β
β β β ChatGPT / β β β ββββββββββββββββββββ β β
β β β Claude.ai β β β β Stage 1: Regex β β β
β β βββββββ¬βββββββ β β β NIC / Email / β β β
β β β β β β Phone redaction β β β
β β βββββββΌβββββββ β β ββββββββββ¬ββββββββββ β β
β β βPromptGuard β βPOST β β β β
β β β Extension ββββΌββββββΌβββββββββββΌββββββββββ β β
β β β(content.js)β β β β Stage 2: Gemma β β β
β β βββββββ¬βββββββ β β β 4:e4b via Ollamaβ β β
β β β βββββββΌβββ€ Contextual PII β β β
β β βββββββΌβββββββ βJSON β β redaction β β β
β β β Input box β β β ββββββββββββββββββββ β β
β β β (cleaned) β β β β β
β β ββββββββββββββ β βββββββββββββββββββββββββ β
β ββββββββββββββββββββ β
β β
β ββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β Ollama Runtime β Gemma 4:e4b model weights β β
β β (local process) β (on-device, no network) β β
β ββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β
ONLY sanitized
prompt leaves
β
βΌ
ββββββββββββββββββββββββ
β ChatGPT / Claude β
β Cloud API β
β (never sees raw PII)β
ββββββββββββββββββββββββ
How It Works: The Full Pipeline
Stage 1: Regex Pre-Redaction (run.py)
Fast, deterministic, zero-latency redaction of known PII patterns:
def regex_redact(text):
# Sri Lanka NIC: 9 digits + V/X suffix
text = re.sub(r'\b\d{9}[VvXx]\b', '[REDACTED_NIC]', text)
# Email addresses
text = re.sub(r'[\w\.-]+@[\w\.-]+', '[REDACTED_EMAIL]', text)
# Phone numbers (10 digits)
text = re.sub(r'\b\d{10}\b', '[REDACTED_PHONE]', text)
return text
This catches the easy cases instantly before Gemma 4 even sees the text β reducing both latency and the model's cognitive load.
Stage 2: Gemma 4 Contextual Redaction (run.py)
The partially-redacted text goes to Gemma 4:e4b with a precisely engineered system prompt:
def sanitize_prompt(prompt: str) -> str:
partially_redacted = regex_redact(prompt)
system_prompt = f"""
You are PromptGuard, a privacy-preserving AI firewall.
Redact sensitive information while preserving readability.
Text:
{partially_redacted}
"""
response = ollama.chat(
model="gemma4:e4b",
messages=[{"role": "user", "content": system_prompt}]
)
return response['message']['content']
Gemma 4 handles what regex can't:
- Full names in natural language
- Medical conditions and health data
- Financial details described in prose
- Implicit references to identifiable information
- Sensitive context even without explicit identifiers
Stage 3: FastAPI Endpoint
The backend exposes a single clean endpoint:
# FastAPI backend (inferred from content.js calling /scan)
@app.post("/scan")
async def scan_prompt(payload: PromptRequest):
safe = sanitize_prompt(payload.prompt)
return {"safe_prompt": safe}
The extension POSTs to http://127.0.0.1:8000/scan β purely local, no TLS required, no external network call.
The Chrome Extension
The extension (promptguard-extension/) is a Manifest V3 Chrome extension with two files:
manifest.json β declares permissions and injection targets:
{
"manifest_version": 3,
"name": "PromptGuard",
"version": "1.0",
"permissions": ["activeTab", "scripting"],
"host_permissions": [
"https://chatgpt.com/*",
"https://claude.ai/*"
],
"content_scripts": [
{
"matches": ["https://chatgpt.com/*", "https://claude.ai/*"],
"js": ["content.js"]
}
]
}
content.js β injects a persistent "Sanitize Prompt" button and handles the interception flow:
async function sanitizePrompt() {
// Find the active prompt input (handles both contenteditable and textarea)
const inputBoxes = document.querySelectorAll(
'[contenteditable="true"], textarea'
);
let inputBox = null;
for (let box of inputBoxes) {
if ((box.innerText?.length > 0) || (box.value?.length > 0)) {
inputBox = box;
break;
}
}
if (!inputBox) { alert("No prompt input found"); return; }
const originalPrompt = inputBox.value || inputBox.innerText;
// Send to local backend
const response = await fetch("http://127.0.0.1:8000/scan", {
method: "POST",
headers: { "Content-Type": "application/json" },
body: JSON.stringify({ prompt: originalPrompt })
});
const data = await response.json();
// Replace input with sanitized version
if (inputBox.value !== undefined) inputBox.value = data.safe_prompt;
else inputBox.innerText = data.safe_prompt;
// Trigger React's onChange so the UI recognizes the update
inputBox.dispatchEvent(new Event('input', { bubbles: true }));
alert("Prompt sanitized β");
}
// Inject the button and keep it alive through dynamic UI re-renders
function createButton() {
if (document.getElementById("promptguard-btn")) return;
const button = document.createElement("button");
button.id = "promptguard-btn";
button.innerText = "π‘οΈ Sanitize";
// ... styling
button.onclick = sanitizePrompt;
document.body.appendChild(button);
}
setInterval(createButton, 2000); // Survives React re-renders
The setInterval pattern is intentional β ChatGPT and Claude.ai are React SPAs that frequently re-render the DOM, which can remove injected elements. The interval re-injects the button if it disappears.
The Local Backend
The promptguard/ folder contains the Python backend. To run it:
# 1. Install Ollama (https://ollama.ai)
ollama pull gemma4:e4b
# 2. Install Python dependencies
pip install fastapi uvicorn ollama
# 3. Start the backend
uvicorn main:app --host 127.0.0.1 --port 8000
The backend stays running in the background. The extension talks to it automatically whenever you click "Sanitize."
Real-World Demo:
Here's the scenario this was built for. A legal professional in Sri Lanka is drafting a submission under the PDPA and wants AI assistance.
GitHub: The full code is available in two folders: promptguard/ (Python backend) and promptguard-extension/ (Chrome extension).
mohamednizzad
/
PromptGuard
A local-first AI privacy firewall that sanitizes prompts before they reach the cloud.
π‘οΈ PromptGuard
A local-first AI privacy firewall that sanitizes prompts before they reach the cloud.
PromptGuard intercepts prompts typed into ChatGPT or Claude.ai, runs PII redaction using Gemma 4:e4b entirely on your machine, and replaces the raw prompt with a sanitized version β before anything leaves your device.
Built for the Gemma 4 Challenge on DEV.to
The Problem
Every day, professionals paste sensitive content into public AI interfaces:
- Legal documents with client NIC numbers and case details
- Medical records with patient health conditions
- Financial data with account information and salary details
- HR documents with employee personal data
This creates real legal exposure under Sri Lanka's PDPA No. 9 of 2022, GDPR, UAE PDPL, and equivalent frameworks. PromptGuard sits between your clipboard and the cloud β nothing sensitive gets transmitted.
How It Works
You type a prompt with PII
β
[PromptGuard Extension intercepts on click]
β
POST ββ¦Raw prompt (what they typed):
My client John Doe, NIC 999995678V, reached out via
john.doe@example.com about a data breach at ABCXYZ Pvt Ltd.
Her phone is 0777654321. The breach exposed her health records
including her HIV status from the XYZABC Hospital
admission in March 2024. Draft a letter to the Data Protection
Authority under Section 23 of the PDPA.
After Stage 1 (regex):
My client John Doe, NIC [REDACTED_NIC], reached out via
[REDACTED_EMAIL] about a data breach at XYZABC Hospital.
Her phone is [REDACTED_PHONE]. The breach exposed her health records
including her HIV status from the XYZABC Hospital
admission in March 2024. Draft a letter to the Data Protection
Authority under Section 23 of the PDPA.
After Stage 2 (Gemma 4:e4b):
My client [REDACTED_NAME], NIC [REDACTED_NIC], reached out via
[REDACTED_EMAIL] about a data breach at [REDACTED_ORGANIZATION].
Her phone is [REDACTED_PHONE]. The breach exposed her health records
including [REDACTED_HEALTH_CONDITION] from a hospital admission in
[REDACTED_TIMEFRAME]. Draft a letter to the Data Protection Authority
under Section 23 of the PDPA.
The cloud AI receives a complete, legally actionable task description. The client's identity, health condition, specific organization, and date are never transmitted. The AI can still draft the letter correctly.
What Gemma 4 caught that regex missed:
- Full name (
John Doe) β natural language NER - Organization name (
XYZABC Pvt Ltd) β potential re-identification risk - Health condition (
HIV status) β special category data under PDPA Schedule II - Specific date (
March 2024) β temporal re-identification marker
What Gets Redacted
| PII Type | Detection Method | Example |
|---|---|---|
| Sri Lanka NIC | Regex |
999995678V β [REDACTED_NIC]
|
| Email addresses | Regex |
user@mail.com β [REDACTED_EMAIL]
|
| Phone numbers | Regex |
0777654321 β [REDACTED_PHONE]
|
| Full names | Gemma 4 (NER) |
John Silva β [REDACTED_NAME]
|
| Health conditions | Gemma 4 (context) |
HIV positive β [REDACTED_HEALTH]
|
| Financial details | Gemma 4 (context) |
Rs. 2.4M salary β [REDACTED_FINANCIAL]
|
| Organization names | Gemma 4 (risk assess) |
City Hospital β [REDACTED_ORG]
|
| Dates + context | Gemma 4 (re-id risk) |
March 2024 admission β [REDACTED_TIMEFRAME]
|
| Implied references | Gemma 4 (inference) |
my usual number β [REDACTED_REFERENCE]
|
Limitations and Honest Caveats
This is a v1 proof-of-concept. Here's what it doesn't yet handle well:
False positives. Gemma 4 occasionally over-redacts β removing organizational names that are actually public information and don't need masking. The prompt engineering needs refinement for domain-specific contexts.
Latency. On a mid-range laptop, Gemma 4:e4b takes 2β5 seconds per prompt. For short prompts this is acceptable. For multi-paragraph document pastes, it's noticeable. The regex pre-stage helps, but LLM inference time is the bottleneck.
No feedback loop. The current version replaces the prompt silently. A diff view β showing the user exactly what was changed and why β would significantly improve trust and usability.
Extension CSP constraints. Some AI interfaces (particularly enterprise versions) implement Content Security Policies that may block content script injection. The extension works on standard chatgpt.com and claude.ai but may not work on enterprise/team deployments.
It requires the backend to be running. If the Ollama server or FastAPI backend isn't started, the extension fails silently. Better error messaging and a backend health check are on the roadmap.
What's Next
- Diff view β show what changed before the user submits, not just "sanitized β"
- Domain profiles β legal, medical, financial contexts each have different redaction thresholds
- Firefox support β MV3 is Chromium-specific; a MV2 variant for Firefox is straightforward
- Offline indicator β visual badge showing when the backend is active vs. unavailable
- Fine-tuned Gemma 4 β the PDPA document is already loaded into the RAG agent; fine-tuning Gemma 4 on Sri Lankan PII patterns (NIC format, address structures, Sinhala/Tamil name recognition) would significantly improve local-context accuracy
-
Auto-submit mode β the
app.pyvariant already implements auto-submit after sanitization; making this a configurable toggle is the next UX step
Key Takeaways
- β 100% local β Gemma 4:e4b runs via Ollama on-device; the original prompt never leaves your machine
- β Two-stage pipeline β regex catches known patterns instantly; Gemma 4 catches contextual PII that regex cannot
- β
Model choice was deliberate β
e4bMoE architecture provides near-27B reasoning quality at local inference speeds; 2B/4B under-redacted, 27B was too slow - β Works on ChatGPT and Claude.ai β Chrome extension injects into both without modifying their code
- β PDPA-aligned β the redaction taxonomy maps directly to Sri Lanka PDPA definitions: personal data, special categories, data subject identifiers
- β οΈ Latency is real β 2β5s per prompt on mid-range hardware; acceptable for sensitive workflows, not for casual use
- β οΈ False positives exist β over-redaction is a known v1 limitation; domain profiles will address this
- π The broader implication β as AI becomes embedded in professional workflows, the question isn't "should we use AI?" It's "how do we use AI without creating PDPA/GDPR liability?" PromptGuard is one answer to that question.
Have you dealt with PII leakage in AI workflows? Particularly curious whether legal or healthcare professionals have built their own guardrails β or just accepted the risk. Comments below.

Top comments (0)