I built a Serverless OpenAI Gateway to cut costs by 30% and sanitize PII (Open Source)

#cloudflare #openai #serverless #security

canonical_url: https://github.com/guimaster97/pii-sanitizer-gateway

If you are building LLM wrappers or internal tools, you probably noticed two things killing your margins (and your sleep):

Redundant API Costs: Users asking the same questions over and over, forcing you to pay OpenAI for the same tokens.
Compliance Anxiety: The fear of a user accidentally pasting a client's Name, Email, or Tax ID into your chatbot, which then gets sent to a third-party server (OpenAI/DeepSeek).

I looked for solutions, but most were heavy Enterprise Gateways (Java/Docker) or expensive SaaS. So, I decided to engineer my own solution running entirely on the Edge using Cloudflare Workers.

Here is how I built Sanitiza.AI, an open-source gateway that caches requests and scrubs PII before they leave your network.

🏗️ The Architecture

The goal was Zero DevOps. No Docker containers to manage, no Redis instances to pay for. Just pure Serverless functions.

Runtime: Cloudflare Workers (TypeScript)
Framework: Hono (A lightweight web framework, like Express but for the Edge)
Storage: Cloudflare KV (Key-Value store for caching)
Logic: Native Web Crypto API for hashing.

1. The Money Saver: Smart Caching

For RAG (Retrieval-Augmented Generation) apps, redundancy is huge. To solve this, I implemented a "Smart Cache" mechanism.

Before forwarding a request to OpenAI, the Worker creates a SHA-256 hash of the request body (the prompt + system instructions).


typescript
// How we create a unique fingerprint for the request
async function generateHash(message: string) {
  const msgBuffer = new TextEncoder().encode(message);
  const hashBuffer = await crypto.subtle.digest('SHA-256', msgBuffer);
  const hashArray = Array.from(new Uint8Array(hashBuffer));
  return hashArray.map(b => b.toString(16).padStart(2, '0')).join('');
}
It then checks Cloudflare KV.

Hit: Returns the stored JSON instantly (<50ms latency). Cost: $0.

Miss: Forwards to OpenAI, gets the response, and saves it to KV for 24 hours.

The Result: In my internal tools, I achieved a ~30% cache hit rate, effectively cutting my OpenAI bill by a third.

2. The Shield: PII Sanitization
Sending PII (Personally Identifiable Information) to external LLMs is a GDPR/LGPD nightmare.

I implemented a hybrid sanitization engine that runs before the request leaves the Edge. It uses Regex patterns (and can be extended with NER models) to identify sensitive entities.

Example Input:

"Analyze the contract for john.doe@email.com regarding the debt of $50k."

What OpenAI Receives:

"Analyze the contract for [EMAIL_HIDDEN] regarding the debt of [MONEY_HIDDEN]."

The Worker maintains a mapping of these placeholders. When the LLM responds, the Worker "re-hydrates" the response, putting the real data back in if necessary, so the user never knows it was hidden.

3. The ROI Dashboard
Engineering is useless if you can't prove value. I built a simple Admin Dashboard (served by the Worker itself via Hono) that tracks every token saved.

(You can see the live calculator in the repo)

🚀 Performance on the Edge
The biggest advantage of using Cloudflare Workers over a Python Proxy (like LiteLLM) is latency. The code runs in data centers close to the user.

Cold Start: 0ms (practically).

Cache Response Time: < 50ms.

Throughput: Cloudflare's network handles the load balancing.

💻 The Code (Open Source)
I decided to open-source the core engine under the MIT license. You can deploy it to your own Cloudflare account in about 2 minutes using the "Deploy" button.

It works with OpenAI, DeepSeek, Groq, and any other compatible API.

Repository: github.com/guimaster97/pii-sanitizer-gateway

I'm currently looking for contributors to help implement Semantic Caching (using Vectorize) to catch prompts that are similar but not identical. If you are into Rust/WASM or Vector Databases, let's talk!

If you found this useful, drop a star on the repo. It helps a lot!