canonical_url: https://github.com/guimaster97/pii-sanitizer-gateway
If you are building LLM wrappers or internal tools, you probably noticed two things killing your margins (and your sleep):
- Redundant API Costs: Users asking the same questions over and over, forcing you to pay OpenAI for the same tokens.
- Compliance Anxiety: The fear of a user accidentally pasting a client's Name, Email, or Tax ID into your chatbot, which then gets sent to a third-party server (OpenAI/DeepSeek).
I looked for solutions, but most were heavy Enterprise Gateways (Java/Docker) or expensive SaaS. So, I decided to engineer my own solution running entirely on the Edge using Cloudflare Workers.
Here is how I built Sanitiza.AI, an open-source gateway that caches requests and scrubs PII before they leave your network.
🏗️ The Architecture
The goal was Zero DevOps. No Docker containers to manage, no Redis instances to pay for. Just pure Serverless functions.
- Runtime: Cloudflare Workers (TypeScript)
- Framework: Hono (A lightweight web framework, like Express but for the Edge)
- Storage: Cloudflare KV (Key-Value store for caching)
- Logic: Native Web Crypto API for hashing.
1. The Money Saver: Smart Caching
For RAG (Retrieval-Augmented Generation) apps, redundancy is huge. To solve this, I implemented a "Smart Cache" mechanism.
Before forwarding a request to OpenAI, the Worker creates a SHA-256 hash of the request body (the prompt + system instructions).
typescript
// How we create a unique fingerprint for the request
async function generateHash(message: string) {
const msgBuffer = new TextEncoder().encode(message);
const hashBuffer = await crypto.subtle.digest('SHA-256', msgBuffer);
const hashArray = Array.from(new Uint8Array(hashBuffer));
return hashArray.map(b => b.toString(16).padStart(2, '0')).join('');
}
It then checks Cloudflare KV.
Hit: Returns the stored JSON instantly (<50ms latency). Cost: $0.
Miss: Forwards to OpenAI, gets the response, and saves it to KV for 24 hours.
The Result: In my internal tools, I achieved a ~30% cache hit rate, effectively cutting my OpenAI bill by a third.
2. The Shield: PII Sanitization
Sending PII (Personally Identifiable Information) to external LLMs is a GDPR/LGPD nightmare.
I implemented a hybrid sanitization engine that runs before the request leaves the Edge. It uses Regex patterns (and can be extended with NER models) to identify sensitive entities.
Example Input:
"Analyze the contract for john.doe@email.com regarding the debt of $50k."
What OpenAI Receives:
"Analyze the contract for [EMAIL_HIDDEN] regarding the debt of [MONEY_HIDDEN]."
The Worker maintains a mapping of these placeholders. When the LLM responds, the Worker "re-hydrates" the response, putting the real data back in if necessary, so the user never knows it was hidden.
3. The ROI Dashboard
Engineering is useless if you can't prove value. I built a simple Admin Dashboard (served by the Worker itself via Hono) that tracks every token saved.
(You can see the live calculator in the repo)
🚀 Performance on the Edge
The biggest advantage of using Cloudflare Workers over a Python Proxy (like LiteLLM) is latency. The code runs in data centers close to the user.
Cold Start: 0ms (practically).
Cache Response Time: < 50ms.
Throughput: Cloudflare's network handles the load balancing.
💻 The Code (Open Source)
I decided to open-source the core engine under the MIT license. You can deploy it to your own Cloudflare account in about 2 minutes using the "Deploy" button.
It works with OpenAI, DeepSeek, Groq, and any other compatible API.
Repository: github.com/guimaster97/pii-sanitizer-gateway
I'm currently looking for contributors to help implement Semantic Caching (using Vectorize) to catch prompts that are similar but not identical. If you are into Rust/WASM or Vector Databases, let's talk!
If you found this useful, drop a star on the repo. It helps a lot!

Top comments (0)