canonical_url: https://github.com/guimaster97/pii-sanitizer-gateway
If you are building LLM wrappers or internal tools, you probably noticed two things killing your margins (and your sleep):
- Redundant API Costs: Users asking the same questions over and over, forcing you to pay OpenAI for the same tokens.
- Compliance Anxiety: The fear of a user accidentally pasting a client's Name, Email, or Tax ID into your chatbot, which then gets sent to a third-party server (OpenAI/DeepSeek).
I looked for solutions, but most were heavy Enterprise Gateways (Java/Docker) or expensive SaaS. So, I decided to engineer my own solution running entirely on the Edge using Cloudflare Workers.
Here is how I built Sanitiza.AI, an open-source gateway that caches requests and scrubs PII before they leave your network.
ποΈ The Architecture
The goal was Zero DevOps. No Docker containers to manage, no Redis instances to pay for. Just pure Serverless functions.
- Runtime: Cloudflare Workers (TypeScript)
- Framework: Hono (A lightweight web framework, like Express but for the Edge)
- Storage: Cloudflare KV (Key-Value store for caching)
- Logic: Native Web Crypto API for hashing.
1. The Money Saver: Smart Caching
For RAG (Retrieval-Augmented Generation) apps, redundancy is huge. To solve this, I implemented a "Smart Cache" mechanism.
Before forwarding a request to OpenAI, the Worker creates a SHA-256 hash of the request body (the prompt + system instructions).
typescript
// How we create a unique fingerprint for the request
async function generateHash(message: string) {
const msgBuffer = new TextEncoder().encode(message);
const hashBuffer = await crypto.subtle.digest('SHA-256', msgBuffer);
const hashArray = Array.from(new Uint8Array(hashBuffer));
return hashArray.map(b => b.toString(16).padStart(2, '0')).join('');
}
It then checks Cloudflare KV.
Hit: Returns the stored JSON instantly (<50ms latency). Cost: $0.
Miss: Forwards to OpenAI, gets the response, and saves it to KV for 24 hours.
The Result: In my internal tools, I achieved a ~30% cache hit rate, effectively cutting my OpenAI bill by a third.
2. The Shield: PII Sanitization
Sending PII (Personally Identifiable Information) to external LLMs is a GDPR/LGPD nightmare.
I implemented a hybrid sanitization engine that runs before the request leaves the Edge. It uses Regex patterns (and can be extended with NER models) to identify sensitive entities.
Example Input:
"Analyze the contract for john.doe@email.com regarding the debt of $50k."
What OpenAI Receives:
"Analyze the contract for [EMAIL_HIDDEN] regarding the debt of [MONEY_HIDDEN]."
The Worker maintains a mapping of these placeholders. When the LLM responds, the Worker "re-hydrates" the response, putting the real data back in if necessary, so the user never knows it was hidden.
3. The ROI Dashboard
Engineering is useless if you can't prove value. I built a simple Admin Dashboard (served by the Worker itself via Hono) that tracks every token saved.
(You can see the live calculator in the repo)
π Performance on the Edge
The biggest advantage of using Cloudflare Workers over a Python Proxy (like LiteLLM) is latency. The code runs in data centers close to the user.
Cold Start: 0ms (practically).
Cache Response Time: < 50ms.
Throughput: Cloudflare's network handles the load balancing.
π» The Code (Open Source)
I decided to open-source the core engine under the MIT license. You can deploy it to your own Cloudflare account in about 2 minutes using the "Deploy" button.
It works with OpenAI, DeepSeek, Groq, and any other compatible API.
Repository: github.com/guimaster97/pii-sanitizer-gateway
I'm currently looking for contributors to help implement Semantic Caching (using Vectorize) to catch prompts that are similar but not identical. If you are into Rust/WASM or Vector Databases, let's talk!
If you found this useful, drop a star on the repo. It helps a lot!

Top comments (0)