<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Guilherme Ferreira</title>
    <description>The latest articles on DEV Community by Guilherme Ferreira (@guilherme_ferreira_87ce22).</description>
    <link>https://dev.to/guilherme_ferreira_87ce22</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3587691%2F2ddb0667-9f69-4114-a59a-c4653086b9a2.png</url>
      <title>DEV Community: Guilherme Ferreira</title>
      <link>https://dev.to/guilherme_ferreira_87ce22</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/guilherme_ferreira_87ce22"/>
    <language>en</language>
    <item>
      <title>I built a Serverless OpenAI Gateway to cut costs by 30% and sanitize PII (Open Source)</title>
      <dc:creator>Guilherme Ferreira</dc:creator>
      <pubDate>Sat, 31 Jan 2026 01:14:51 +0000</pubDate>
      <link>https://dev.to/guilherme_ferreira_87ce22/i-built-a-serverless-openai-gateway-to-cut-costs-by-30-and-sanitize-pii-open-source-5g06</link>
      <guid>https://dev.to/guilherme_ferreira_87ce22/i-built-a-serverless-openai-gateway-to-cut-costs-by-30-and-sanitize-pii-open-source-5g06</guid>
      <description>&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fc4khcb83p8s23seu2u12.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fc4khcb83p8s23seu2u12.png" alt="Dashboard Preview" width="800" height="564"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  canonical_url: &lt;a href="https://github.com/guimaster97/pii-sanitizer-gateway" rel="noopener noreferrer"&gt;https://github.com/guimaster97/pii-sanitizer-gateway&lt;/a&gt;
&lt;/h2&gt;

&lt;p&gt;If you are building LLM wrappers or internal tools, you probably noticed two things killing your margins (and your sleep):&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt; &lt;strong&gt;Redundant API Costs:&lt;/strong&gt; Users asking the same questions over and over, forcing you to pay OpenAI for the same tokens.&lt;/li&gt;
&lt;li&gt; &lt;strong&gt;Compliance Anxiety:&lt;/strong&gt; The fear of a user accidentally pasting a client's Name, Email, or Tax ID into your chatbot, which then gets sent to a third-party server (OpenAI/DeepSeek).&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;I looked for solutions, but most were heavy Enterprise Gateways (Java/Docker) or expensive SaaS. So, I decided to engineer my own solution running entirely on the Edge using &lt;strong&gt;Cloudflare Workers&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Here is how I built &lt;strong&gt;Sanitiza.AI&lt;/strong&gt;, an open-source gateway that caches requests and scrubs PII before they leave your network.&lt;/p&gt;

&lt;h2&gt;
  
  
  🏗️ The Architecture
&lt;/h2&gt;

&lt;p&gt;The goal was &lt;strong&gt;Zero DevOps&lt;/strong&gt;. No Docker containers to manage, no Redis instances to pay for. Just pure Serverless functions.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Runtime:&lt;/strong&gt; Cloudflare Workers (TypeScript)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Framework:&lt;/strong&gt; Hono (A lightweight web framework, like Express but for the Edge)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Storage:&lt;/strong&gt; Cloudflare KV (Key-Value store for caching)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Logic:&lt;/strong&gt; Native Web Crypto API for hashing.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  1. The Money Saver: Smart Caching
&lt;/h3&gt;

&lt;p&gt;For RAG (Retrieval-Augmented Generation) apps, redundancy is huge. To solve this, I implemented a "Smart Cache" mechanism.&lt;/p&gt;

&lt;p&gt;Before forwarding a request to OpenAI, the Worker creates a &lt;strong&gt;SHA-256 hash&lt;/strong&gt; of the request body (the prompt + system instructions).&lt;/p&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;
typescript
// How we create a unique fingerprint for the request
async function generateHash(message: string) {
  const msgBuffer = new TextEncoder().encode(message);
  const hashBuffer = await crypto.subtle.digest('SHA-256', msgBuffer);
  const hashArray = Array.from(new Uint8Array(hashBuffer));
  return hashArray.map(b =&amp;gt; b.toString(16).padStart(2, '0')).join('');
}
It then checks Cloudflare KV.

Hit: Returns the stored JSON instantly (&amp;lt;50ms latency). Cost: $0.

Miss: Forwards to OpenAI, gets the response, and saves it to KV for 24 hours.

The Result: In my internal tools, I achieved a ~30% cache hit rate, effectively cutting my OpenAI bill by a third.

2. The Shield: PII Sanitization
Sending PII (Personally Identifiable Information) to external LLMs is a GDPR/LGPD nightmare.

I implemented a hybrid sanitization engine that runs before the request leaves the Edge. It uses Regex patterns (and can be extended with NER models) to identify sensitive entities.

Example Input:

"Analyze the contract for john.doe@email.com regarding the debt of $50k."

What OpenAI Receives:

"Analyze the contract for [EMAIL_HIDDEN] regarding the debt of [MONEY_HIDDEN]."

The Worker maintains a mapping of these placeholders. When the LLM responds, the Worker "re-hydrates" the response, putting the real data back in if necessary, so the user never knows it was hidden.

3. The ROI Dashboard
Engineering is useless if you can't prove value. I built a simple Admin Dashboard (served by the Worker itself via Hono) that tracks every token saved.

(You can see the live calculator in the repo)

🚀 Performance on the Edge
The biggest advantage of using Cloudflare Workers over a Python Proxy (like LiteLLM) is latency. The code runs in data centers close to the user.

Cold Start: 0ms (practically).

Cache Response Time: &amp;lt; 50ms.

Throughput: Cloudflare's network handles the load balancing.

💻 The Code (Open Source)
I decided to open-source the core engine under the MIT license. You can deploy it to your own Cloudflare account in about 2 minutes using the "Deploy" button.

It works with OpenAI, DeepSeek, Groq, and any other compatible API.

Repository: github.com/guimaster97/pii-sanitizer-gateway

I'm currently looking for contributors to help implement Semantic Caching (using Vectorize) to catch prompts that are similar but not identical. If you are into Rust/WASM or Vector Databases, let's talk!

If you found this useful, drop a star on the repo. It helps a lot!
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

</description>
      <category>cloudflare</category>
      <category>openai</category>
      <category>serverless</category>
      <category>security</category>
    </item>
  </channel>
</rss>
