DEV Community

Cover image for Stop Leaking PII Through Your OpenAI API Calls
Ben Mann
Ben Mann

Posted on

Stop Leaking PII Through Your OpenAI API Calls

Every chat.completions.create call sends your prompt to OpenAI's servers. If that prompt contains user data — support tickets, form inputs, CRM records — there's a good chance it includes names, emails, phone numbers, and worse.

const response = await openai.chat.completions.create({
  model: "gpt-4o",
  messages: [
    {
      role: "user",
      content: `Summarize this support ticket:

      From: Sarah Chen <sarah.chen@acme.com>
      Phone: (415) 555-0142
      SSN: 521-44-8832

      My order #38291 hasn't arrived. I live at
      742 Evergreen Terrace, Springfield, IL 62704.`,
    },
  ],
});
Enter fullscreen mode Exit fullscreen mode

That single request just sent a name, email, phone number, SSN, and home address to an external service. Under GDPR, CCPA, or HIPAA, that's a compliance incident waiting to happen.

The problem is invisible

Most teams don't audit what's inside their AI prompts. The Authorization header is your OpenAI key — that's expected. The problem is the request body.

PII shows up in places you don't expect:

  • Support tickets — customer names, emails, account numbers embedded in the text
  • RAG chunks — documents from your vector store may contain PII from the original source
  • Chat history — previous messages in a conversation accumulate identifiers
  • CRM data — customer records pulled into prompts for personalization
  • Code snippets — hardcoded credentials, API keys, database connection strings

And it's not just direct identifiers. Under GDPR, data is personal if it can be combined with other information to identify someone. A user ID + timestamp + location? That's personal data.

What you can do about it

There are three approaches, from manual to automated:

1. Manual redaction (doesn't scale)

Write regex patterns or use string replacement to strip known PII patterns before each API call. This works for obvious cases (emails, phone numbers) but misses freeform PII like names in unstructured text.

// Fragile and incomplete
const sanitized = input
  .replace(/[\w.-]+@[\w.-]+\.\w+/g, "[EMAIL]")
  .replace(/\d{3}-\d{2}-\d{4}/g, "[SSN]");
Enter fullscreen mode Exit fullscreen mode

Problems: you have to maintain the patterns, they miss edge cases, and you can't restore the original values in the response.

2. NER-based detection (better, but heavy)

Run a Named Entity Recognition model (spaCy, Presidio, etc.) on every prompt before sending it. More accurate for names and organizations, but adds latency and infrastructure complexity.

3. Proxy-level redaction

Put a scanning proxy between your app and the AI provider. Every request is inspected and sanitized before it leaves your infrastructure. No code changes in your application.

This is the approach I built Grepture around — it's an open-source security proxy that sits in front of any AI API. Here's what the setup looks like:

import OpenAI from "openai";
import { Grepture } from "@grepture/sdk";

const grepture = new Grepture({
  apiKey: process.env.GREPTURE_API_KEY!,
  proxyUrl: "https://proxy.grepture.com",
});

const openai = new OpenAI({
  ...grepture.clientOptions({
    apiKey: process.env.OPENAI_API_KEY!,
    baseURL: "https://api.openai.com/v1",
  }),
});

// Every request is now scanned — your code doesn't change
const response = await openai.chat.completions.create({
  model: "gpt-4o",
  messages: [{ role: "user", content: userInput }],
});
Enter fullscreen mode Exit fullscreen mode

clientOptions() reroutes traffic through the proxy. Your OpenAI key is forwarded securely. The proxy scans every request against 50+ detection patterns (80+ on Pro) — emails, phone numbers, SSNs, credit cards, API keys, IBANs, and more.

Reversible redaction: the key feature

Plain redaction breaks things. If you strip all names from a support ticket, the AI's summary is useless — "The customer [REDACTED] has an issue with [REDACTED]."

Reversible redaction (mask-and-restore) solves this. PII is replaced with consistent tokens:

What OpenAI sees:

Summarize this support ticket:
From: [PERSON_1] <[EMAIL_1]>
Phone: [PHONE_1]
SSN: [SSN_1]
My order #38291 hasn't arrived. I live at [ADDRESS_1].
Enter fullscreen mode Exit fullscreen mode

What your app gets back:

The customer Sarah Chen (sarah.chen@acme.com) is asking about
order #38291 which hasn't been delivered to 742 Evergreen Terrace,
Springfield, IL 62704.
Enter fullscreen mode Exit fullscreen mode

The model processes clean data with consistent entity references. Your application receives the full, personalized response. No PII ever reaches OpenAI.

Works with any provider

While I used OpenAI in these examples, the same proxy approach works with any AI provider — Anthropic, Google Gemini, Azure OpenAI, AWS Bedrock, Mistral, Groq. You just change the baseURL and apiKey:

// Anthropic
const anthropic = new Anthropic({
  ...grepture.clientOptions({
    apiKey: process.env.ANTHROPIC_API_KEY!,
    baseURL: "https://api.anthropic.com",
  }),
});

// Google Gemini (OpenAI-compatible endpoint)
const gemini = new OpenAI({
  ...grepture.clientOptions({
    apiKey: process.env.GEMINI_API_KEY!,
    baseURL: "https://generativelanguage.googleapis.com/v1beta/openai",
  }),
});
Enter fullscreen mode Exit fullscreen mode

For non-SDK calls (webhooks, custom HTTP requests), there's a drop-in fetch replacement:

const response = await grepture.fetch("https://api.example.com/data", {
  method: "POST",
  body: JSON.stringify(payload),
});
Enter fullscreen mode Exit fullscreen mode

GDPR angle: why this matters now

If you're processing EU user data through AI APIs, every API call is a data transfer to a third-party processor. GDPR requires:

  • Data minimization — only send what's necessary
  • Data Processing Agreements — signed with every AI provider
  • Transfer Impact Assessments — for cross-border transfers to US providers

The simplest way to satisfy data minimization? Don't send personal data at all. Redact before the API call, restore after.

I wrote a longer guide on this: How to Make AI API Calls GDPR-Compliant.

Getting started

  1. npm install @grepture/sdk
  2. Get an API key at grepture.com — free tier includes 1,000 requests/month
  3. Wrap your AI client with clientOptions() or use grepture.fetch()

The docs have setup guides for every major provider.

Top comments (0)