Nizzad

Posted on May 25

🛡️ PromptGuard: I Built a Local AI Privacy Firewall That Sanitizes Your Prompts Before They Leave Your Machine

#devchallenge #gemmachallenge #gemma

Gemma 4 Challenge: Build With Gemma 4 Submission

This is a submission for the Gemma 4 Challenge: Build with Gemma 4

Every time you paste sensitive data, legal documents, or personal details into ChatGPT or Claude, that data leaves your device. PromptGuard intercepts it first — and Gemma 4 running locally does the redaction before the prompt ever touches a cloud server.

The Problem Nobody Talks About
What I Built
Why Gemma 4 Is the Right Model for This
System Architecture
How It Works: The Full Pipeline
The Chrome Extension
The Local Backend
Real-World Demo: Legal Document Workflow
What Gets Redacted
Limitations and Honest Caveats
What's Next
Key Takeaways

The Problem Nobody Talks About

Every week, professionals across healthcare, law, finance, and government are doing something they probably shouldn't: pasting sensitive documents directly into public AI interfaces.

A lawyer drafting a brief pastes a client's NIC number, phone, and case details into ChatGPT to get a summary. A doctor asks Claude to help structure a patient report — with the patient's full name and health data in the prompt. A developer pastes a production database dump to debug a query. A researcher uploads a compliance document containing employee records.

Each of those prompts is transmitted to a cloud server, processed, potentially logged for safety review, and retained under terms of service that most users haven't read carefully.

This isn't a hypothetical risk. Sri Lanka's Personal Data Protection Act No. 9 of 2022 (PDPA) imposes legal obligations on controllers who process personal data. Section 10 requires appropriate technical and organizational measures. Sections 13–18 guarantee data subject rights that can be violated by unauthorized disclosure. Section 38 sets penalties up to Rs. 10 million per non-compliance.

GDPR, UAE PDPL, and equivalent frameworks carry similar — or higher — obligations.

The problem: there is no guard between the user's clipboard and the AI's cloud API.

PromptGuard is that guard.

What I Built

PromptGuard is a two-component, local-first privacy firewall:

1. promptguard/ — Local Python backend
A FastAPI server running on localhost:8000 that receives raw prompts, runs a two-stage redaction pipeline using Gemma 4 via Ollama, and returns a sanitized version. Zero network calls. Everything on-device.

2. promptguard-extension/ — Chrome Extension (Manifest V3)
Injects a "Sanitize Prompt" button into ChatGPT and Claude.ai. When clicked, it intercepts the current prompt, sends it to the local backend for sanitization, replaces the prompt in the input box with the cleaned version, and only then allows the user to submit.

User types prompt with PII
        ↓
[Chrome Extension intercepts]
        ↓
POST to localhost:8000/scan
        ↓
[Regex pre-redaction: NIC, email, phone]
        ↓
[Gemma 4:e4b on-device LLM redaction]
        ↓
Safe prompt returned
        ↓
Input box updated with sanitized version
        ↓
User submits to ChatGPT / Claude — clean

The entire redaction process happens on your machine. The cloud AI never sees the original.

Why Gemma 4 Is the Right Model for This

This is the question the judges will ask — so I want to answer it directly and honestly.

Why not GPT-4o or Claude for redaction?

Sending sensitive data to a cloud API to redact sensitive data before sending to a cloud API is circular and defeats the purpose entirely. The solution has to be local.

Why not a simple regex approach?

Regex handles known patterns — NIC numbers (\d{9}[VvXx]), emails, phone numbers. But PII is contextual:

"Call me at the usual number" — no regex catches this
"The patient presented at 14:30, John Smith, age 34" — name + age in natural language
"My CNIC is written on the form I mentioned earlier" — reference without the number itself
"Send it to the Gmail I use for work" — implied email without the address

You need a model that understands intent and context, not just patterns. Gemma 4 provides that understanding at a scale that runs locally.

Why Gemma 4 specifically — and why the `e4b` variant?

Gemma 4 model family:
  2B / 4B  → ultra-mobile, browser, edge (Pixel, Raspberry Pi)
  27B      → server-grade, high accuracy
  e4b (MoE) → efficient inference, advanced reasoning, local deployment

gemma4:e4b is the Mixture-of-Experts variant — it activates only the expert subnetworks relevant to the current task. For a redaction task that requires:

Named entity recognition in natural language
Context-aware sensitivity detection
Understanding of legal and medical terminology
Preservation of semantic meaning after redaction

The MoE architecture gives you reasoning quality close to the 27B model at a fraction of the inference cost. It runs comfortably on a machine with 16GB RAM via Ollama. The 2B/4B models were too aggressive — they redacted useful context along with PII. The 27B model was too slow for real-time prompt interception. e4b was the right balance.

This wasn't a default choice. I tested all three and e4b was the only one that preserved readability while catching contextual PII that regex missed.

System Architecture

┌─────────────────────────────────────────────────────────┐
│                     USER'S MACHINE                       │
│                                                         │
│  ┌──────────────────┐     ┌───────────────────────┐     │
│  │  Chrome Browser  │     │   PromptGuard Backend  │     │
│  │                  │     │   (FastAPI :8000)       │     │
│  │  ┌────────────┐  │     │                        │     │
│  │  │ ChatGPT /  │  │     │  ┌──────────────────┐ │     │
│  │  │ Claude.ai  │  │     │  │  Stage 1: Regex   │ │     │
│  │  └─────┬──────┘  │     │  │  NIC / Email /    │ │     │
│  │        │         │     │  │  Phone redaction  │ │     │
│  │  ┌─────▼──────┐  │     │  └────────┬─────────┘ │     │
│  │  │PromptGuard │  │POST │           │            │     │
│  │  │ Extension  ├──┼─────┼──────────▼─────────┐  │     │
│  │  │(content.js)│  │     │  │  Stage 2: Gemma  │  │     │
│  │  └─────┬──────┘  │     │  │  4:e4b via Ollama│  │     │
│  │        │         │◄────┼──┤  Contextual PII  │  │     │
│  │  ┌─────▼──────┐  │JSON │  │  redaction       │  │     │
│  │  │  Input box │  │     │  └──────────────────┘ │     │
│  │  │  (cleaned) │  │     │                        │     │
│  │  └────────────┘  │     └───────────────────────┘     │
│  └──────────────────┘                                   │
│                                                         │
│  ┌──────────────────────────────────────────────────┐   │
│  │  Ollama Runtime  │  Gemma 4:e4b model weights    │   │
│  │  (local process) │  (on-device, no network)      │   │
│  └──────────────────────────────────────────────────┘   │
│                                                         │
└─────────────────────────────────────────────────────────┘
                         │
                    ONLY sanitized
                    prompt leaves
                         │
                         ▼
              ┌──────────────────────┐
              │  ChatGPT / Claude    │
              │  Cloud API           │
              │  (never sees raw PII)│
              └──────────────────────┘

How It Works: The Full Pipeline

Stage 1: Regex Pre-Redaction (`run.py`)

Fast, deterministic, zero-latency redaction of known PII patterns:

def regex_redact(text):
    # Sri Lanka NIC: 9 digits + V/X suffix
    text = re.sub(r'\b\d{9}[VvXx]\b', '[REDACTED_NIC]', text)

    # Email addresses
    text = re.sub(r'[\w\.-]+@[\w\.-]+', '[REDACTED_EMAIL]', text)

    # Phone numbers (10 digits)
    text = re.sub(r'\b\d{10}\b', '[REDACTED_PHONE]', text)

    return text

This catches the easy cases instantly before Gemma 4 even sees the text — reducing both latency and the model's cognitive load.

Stage 2: Gemma 4 Contextual Redaction (`run.py`)

The partially-redacted text goes to Gemma 4:e4b with a precisely engineered system prompt:

def sanitize_prompt(prompt: str) -> str:
    partially_redacted = regex_redact(prompt)

    system_prompt = f"""
You are PromptGuard, a privacy-preserving AI firewall.

Redact sensitive information while preserving readability.

Text:
{partially_redacted}
"""
    response = ollama.chat(
        model="gemma4:e4b",
        messages=[{"role": "user", "content": system_prompt}]
    )
    return response['message']['content']

Gemma 4 handles what regex can't:

Full names in natural language
Medical conditions and health data
Financial details described in prose
Implicit references to identifiable information
Sensitive context even without explicit identifiers

Stage 3: FastAPI Endpoint

The backend exposes a single clean endpoint:

# FastAPI backend (inferred from content.js calling /scan)
@app.post("/scan")
async def scan_prompt(payload: PromptRequest):
    safe = sanitize_prompt(payload.prompt)
    return {"safe_prompt": safe}

The extension POSTs to http://127.0.0.1:8000/scan — purely local, no TLS required, no external network call.

The Chrome Extension

The extension (promptguard-extension/) is a Manifest V3 Chrome extension with two files:

manifest.json — declares permissions and injection targets:

{
  "manifest_version": 3,
  "name": "PromptGuard",
  "version": "1.0",
  "permissions": ["activeTab", "scripting"],
  "host_permissions": [
    "https://chatgpt.com/*",
    "https://claude.ai/*"
  ],
  "content_scripts": [
    {
      "matches": ["https://chatgpt.com/*", "https://claude.ai/*"],
      "js": ["content.js"]
    }
  ]
}

content.js — injects a persistent "Sanitize Prompt" button and handles the interception flow:

async function sanitizePrompt() {
    // Find the active prompt input (handles both contenteditable and textarea)
    const inputBoxes = document.querySelectorAll(
        '[contenteditable="true"], textarea'
    );
    let inputBox = null;
    for (let box of inputBoxes) {
        if ((box.innerText?.length > 0) || (box.value?.length > 0)) {
            inputBox = box;
            break;
        }
    }
    if (!inputBox) { alert("No prompt input found"); return; }

    const originalPrompt = inputBox.value || inputBox.innerText;

    // Send to local backend
    const response = await fetch("http://127.0.0.1:8000/scan", {
        method: "POST",
        headers: { "Content-Type": "application/json" },
        body: JSON.stringify({ prompt: originalPrompt })
    });

    const data = await response.json();

    // Replace input with sanitized version
    if (inputBox.value !== undefined) inputBox.value = data.safe_prompt;
    else inputBox.innerText = data.safe_prompt;

    // Trigger React's onChange so the UI recognizes the update
    inputBox.dispatchEvent(new Event('input', { bubbles: true }));

    alert("Prompt sanitized ✓");
}

// Inject the button and keep it alive through dynamic UI re-renders
function createButton() {
    if (document.getElementById("promptguard-btn")) return;
    const button = document.createElement("button");
    button.id = "promptguard-btn";
    button.innerText = "🛡️ Sanitize";
    // ... styling
    button.onclick = sanitizePrompt;
    document.body.appendChild(button);
}

setInterval(createButton, 2000); // Survives React re-renders

The setInterval pattern is intentional — ChatGPT and Claude.ai are React SPAs that frequently re-render the DOM, which can remove injected elements. The interval re-injects the button if it disappears.

The Local Backend

The promptguard/ folder contains the Python backend. To run it:

# 1. Install Ollama (https://ollama.ai)
ollama pull gemma4:e4b

# 2. Install Python dependencies
pip install fastapi uvicorn ollama

# 3. Start the backend
uvicorn main:app --host 127.0.0.1 --port 8000

The backend stays running in the background. The extension talks to it automatically whenever you click "Sanitize."

Real-World Demo:

Here's the scenario this was built for. A legal professional in Sri Lanka is drafting a submission under the PDPA and wants AI assistance.

GitHub: The full code is available in two folders: promptguard/ (Python backend) and promptguard-extension/ (Chrome extension).

mohamednizzad / PromptGuard

A local-first AI privacy firewall that sanitizes prompts before they reach the cloud.

🛡️ PromptGuard

A local-first AI privacy firewall that sanitizes prompts before they reach the cloud.

PromptGuard intercepts prompts typed into ChatGPT or Claude.ai, runs PII redaction using Gemma 4:e4b entirely on your machine, and replaces the raw prompt with a sanitized version — before anything leaves your device.

Built for the Gemma 4 Challenge on DEV.to

The Problem

Every day, professionals paste sensitive content into public AI interfaces:

Legal documents with client NIC numbers and case details
Medical records with patient health conditions
Financial data with account information and salary details
HR documents with employee personal data

This creates real legal exposure under Sri Lanka's PDPA No. 9 of 2022, GDPR, UAE PDPL, and equivalent frameworks. PromptGuard sits between your clipboard and the cloud — nothing sensitive gets transmitted.

How It Works

You type a prompt with PII
        ↓
[PromptGuard Extension intercepts on click]
        ↓
POST →

…

View on GitHub

Raw prompt (what they typed):

My client John Doe, NIC 999995678V, reached out via 
john.doe@example.com about a data breach at ABCXYZ Pvt Ltd. 
Her phone is 0777654321. The breach exposed her health records 
including her HIV status from the XYZABC Hospital 
admission in March 2024. Draft a letter to the Data Protection 
Authority under Section 23 of the PDPA.

After Stage 1 (regex):

My client John Doe, NIC [REDACTED_NIC], reached out via 
[REDACTED_EMAIL] about a data breach at XYZABC Hospital. 
Her phone is [REDACTED_PHONE]. The breach exposed her health records 
including her HIV status from the XYZABC Hospital 
admission in March 2024. Draft a letter to the Data Protection 
Authority under Section 23 of the PDPA.

After Stage 2 (Gemma 4:e4b):

My client [REDACTED_NAME], NIC [REDACTED_NIC], reached out via 
[REDACTED_EMAIL] about a data breach at [REDACTED_ORGANIZATION]. 
Her phone is [REDACTED_PHONE]. The breach exposed her health records 
including [REDACTED_HEALTH_CONDITION] from a hospital admission in 
[REDACTED_TIMEFRAME]. Draft a letter to the Data Protection Authority 
under Section 23 of the PDPA.

The cloud AI receives a complete, legally actionable task description. The client's identity, health condition, specific organization, and date are never transmitted. The AI can still draft the letter correctly.

What Gemma 4 caught that regex missed:

Full name (John Doe) — natural language NER
Organization name (XYZABC Pvt Ltd) — potential re-identification risk
Health condition (HIV status) — special category data under PDPA Schedule II
Specific date (March 2024) — temporal re-identification marker

What Gets Redacted

PII Type	Detection Method	Example
Sri Lanka NIC	Regex	`999995678V` → `[REDACTED_NIC]`
Email addresses	Regex	`user@mail.com` → `[REDACTED_EMAIL]`
Phone numbers	Regex	`0777654321` → `[REDACTED_PHONE]`
Full names	Gemma 4 (NER)	`John Silva` → `[REDACTED_NAME]`
Health conditions	Gemma 4 (context)	`HIV positive` → `[REDACTED_HEALTH]`
Financial details	Gemma 4 (context)	`Rs. 2.4M salary` → `[REDACTED_FINANCIAL]`
Organization names	Gemma 4 (risk assess)	`City Hospital` → `[REDACTED_ORG]`
Dates + context	Gemma 4 (re-id risk)	`March 2024 admission` → `[REDACTED_TIMEFRAME]`
Implied references	Gemma 4 (inference)	`my usual number` → `[REDACTED_REFERENCE]`

Limitations and Honest Caveats

This is a v1 proof-of-concept. Here's what it doesn't yet handle well:

False positives. Gemma 4 occasionally over-redacts — removing organizational names that are actually public information and don't need masking. The prompt engineering needs refinement for domain-specific contexts.

Latency. On a mid-range laptop, Gemma 4:e4b takes 2–5 seconds per prompt. For short prompts this is acceptable. For multi-paragraph document pastes, it's noticeable. The regex pre-stage helps, but LLM inference time is the bottleneck.

No feedback loop. The current version replaces the prompt silently. A diff view — showing the user exactly what was changed and why — would significantly improve trust and usability.

Extension CSP constraints. Some AI interfaces (particularly enterprise versions) implement Content Security Policies that may block content script injection. The extension works on standard chatgpt.com and claude.ai but may not work on enterprise/team deployments.

It requires the backend to be running. If the Ollama server or FastAPI backend isn't started, the extension fails silently. Better error messaging and a backend health check are on the roadmap.

What's Next

Diff view — show what changed before the user submits, not just "sanitized ✓"
Domain profiles — legal, medical, financial contexts each have different redaction thresholds
Firefox support — MV3 is Chromium-specific; a MV2 variant for Firefox is straightforward
Offline indicator — visual badge showing when the backend is active vs. unavailable
Fine-tuned Gemma 4 — the PDPA document is already loaded into the RAG agent; fine-tuning Gemma 4 on Sri Lankan PII patterns (NIC format, address structures, Sinhala/Tamil name recognition) would significantly improve local-context accuracy
Auto-submit mode — the app.py variant already implements auto-submit after sanitization; making this a configurable toggle is the next UX step

Key Takeaways

✅ 100% local — Gemma 4:e4b runs via Ollama on-device; the original prompt never leaves your machine
✅ Two-stage pipeline — regex catches known patterns instantly; Gemma 4 catches contextual PII that regex cannot
✅ Model choice was deliberate — e4b MoE architecture provides near-27B reasoning quality at local inference speeds; 2B/4B under-redacted, 27B was too slow
✅ Works on ChatGPT and Claude.ai — Chrome extension injects into both without modifying their code
✅ PDPA-aligned — the redaction taxonomy maps directly to Sri Lanka PDPA definitions: personal data, special categories, data subject identifiers
⚠️ Latency is real — 2–5s per prompt on mid-range hardware; acceptable for sensitive workflows, not for casual use
⚠️ False positives exist — over-redaction is a known v1 limitation; domain profiles will address this
🔍 The broader implication — as AI becomes embedded in professional workflows, the question isn't "should we use AI?" It's "how do we use AI without creating PDPA/GDPR liability?" PromptGuard is one answer to that question.

Have you dealt with PII leakage in AI workflows? Particularly curious whether legal or healthcare professionals have built their own guardrails — or just accepted the risk. Comments below.

Top comments (5)

Shan F • May 25

Congrats. A brilliant use case.

Joseph • May 26

Great thinking that the AI tooling space needs more of. The two-stage pipeline with regex for speed, Gemma 4 for context is a clean design decision, and grounding it in PDPA obligations makes the use case concrete rather than hypothetical. Well done!

Andy Stewart • May 27

Bringing "local-first" and MoE architecture to the front lines of data sovereignty is absolutely the right architectural direction. Combining regex with Gemma 4 balances lightweight execution with contextual privacy. This is exactly the on-device AI security boundary we advocate for—keeping determinism local and sending only sanitized data to the cloud.

Olive Aaron • May 25 • Edited

Amazing idea. Can this be used when we upload PDF and other documents?

Ibn Ahamed • May 31

Thank you for sharing this detailed guide

DEV Community

🛡️ PromptGuard: I Built a Local AI Privacy Firewall That Sanitizes Your Prompts Before They Leave Your Machine

Table of Contents

The Problem Nobody Talks About

What I Built

Why Gemma 4 Is the Right Model for This

Why not GPT-4o or Claude for redaction?

Why not a simple regex approach?

Why Gemma 4 specifically — and why the `e4b` variant?

System Architecture

How It Works: The Full Pipeline

Stage 1: Regex Pre-Redaction (`run.py`)

Stage 2: Gemma 4 Contextual Redaction (`run.py`)

Stage 3: FastAPI Endpoint

The Chrome Extension

The Local Backend

Real-World Demo:

mohamednizzad / PromptGuard

A local-first AI privacy firewall that sanitizes prompts before they reach the cloud.

🛡️ PromptGuard

The Problem

How It Works

What Gets Redacted

Limitations and Honest Caveats

What's Next

Key Takeaways

Top comments (5)

Table of Contents

The Problem Nobody Talks About

What I Built

Why Gemma 4 Is the Right Model for This

Why not GPT-4o or Claude for redaction?

Why not a simple regex approach?

Why Gemma 4 specifically — and why the e4b variant?

System Architecture

How It Works: The Full Pipeline

Stage 1: Regex Pre-Redaction (run.py)

Stage 2: Gemma 4 Contextual Redaction (run.py)

Stage 3: FastAPI Endpoint

The Chrome Extension

The Local Backend

Real-World Demo:

mohamednizzad / PromptGuard

A local-first AI privacy firewall that sanitizes prompts before they reach the cloud.

🛡️ PromptGuard

The Problem

How It Works

What Gets Redacted

Limitations and Honest Caveats

What's Next

Key Takeaways

Why Gemma 4 specifically — and why the `e4b` variant?

Stage 1: Regex Pre-Redaction (`run.py`)

Stage 2: Gemma 4 Contextual Redaction (`run.py`)