Jay

Posted on Apr 23

OpenAI Just Released a Privacy Filter. Here's What It Can't Do.

#ai #llm #privacy #security

OpenAI released their Privacy Filter this week: a 1.5 billion parameter open-source model that detects and redacts PII from text before it reaches a language model. It runs locally, it's Apache 2.0, and it scores 96% F1 on the PII-Masking-300k benchmark.

It's genuinely good work, and the timing is notable. The company that builds the models developers are sending sensitive data to is now shipping a tool to help them stop doing that. That's a signal.

But after reading through the release and running the model, I think there's an important gap between what the Privacy Filter does and what a real production deployment actually needs. If you're evaluating privacy infrastructure for an LLM pipeline, this gap matters.

What the OpenAI Privacy Filter does well

The model detects 8 entity types: names, addresses, emails, phone numbers, URLs, dates, account numbers, and secrets (passwords, API keys). It runs locally. No data leaves your machine. At int8 quantization, it fits on a standard laptop or browser. The recall is high (98%), which means it misses very little.

For a quick personal project or a developer who wants a sanity check on a dataset, this is genuinely useful. Running entirely on-device is meaningful. You're not trading one data exposure for another.

Where it stops

Here's what the filter does: it redacts. It replaces detected PII with a blank. What it does not do is give you any way to get that information back.

That's fine if you're cleaning a dataset and you never need the original values again. But most LLM pipelines are not dataset cleaning jobs. They look like this:

User submits a query containing their name, account number, date of birth
You need the LLM to reason over that information
The LLM responds
You need the response to reference the user's actual data, not [REDACTED]

One-way redaction breaks this. The LLM gets [REDACTED] opened their account on [REDACTED] and their balance is [REDACTED]. It either produces a meaningless response or hallucinates values to fill the gaps. Neither is acceptable in production.

OpenAI's own documentation acknowledges this directly: "This model is designed to be a redaction aid and should not be considered a safety guarantee." It's a tool for a specific, limited use case. The documentation doesn't position it as infrastructure for a live application.

The three problems redaction can't solve

1. LLM coherence

When you send [REDACTED] to a language model, you're not sending anonymized data. You're sending noise. The model has no signal to reason over.

Fake substitution solves this: replace "John Smith" with "David Park", "555-392-7810" with "555-213-4891". The model reasons naturally over realistic values and produces coherent output. You restore the original values in the response.

This is the difference between a healthcare AI that says "The patient's condition..." and one that says "[REDACTED]'s condition..."

2. Reversibility

Every production system that sanitizes inputs needs to restore outputs. Clinical documentation, financial summaries, legal drafting all share the same requirement: the LLM's response needs to reference real entities. If you can't map sanitized text back to originals, the pipeline isn't useful.

This requires a session model: map each sanitization operation to a session ID, store the token-to-original mapping, and restore on the way back out. A standalone detection model doesn't include any of this. You'd have to build it yourself.

3. Infrastructure burden

The Privacy Filter is 1.5 billion parameters. At float32 that's roughly 6GB. At int8 quantization, it's around 1.5GB. If you want to run this in a serverless environment (Lambda, Cloud Run, or any auto-scaling compute), you're looking at cold start times of 15-20 seconds and significant memory costs. You need to host it, scale it, version it, and monitor it.

This is not a knock on the model. It's the inherent tradeoff of running local ML inference at scale. But "runs on your laptop" and "runs reliably in production at scale" are different problems.

What actually needs to happen before a prompt reaches an LLM

A production privacy layer needs to do five things:

Detect: identify PII across multiple entity types with high recall
Replace: substitute with either tokens or realistic synthetic values (not blanks)
Store: persist the mapping between replacement and original
Forward: send the sanitized prompt to the LLM
Restore: replace synthetic values in the response with originals

The OpenAI Privacy Filter handles step 1 partially, and step 2 only in the redaction sense. It doesn't touch steps 3, 4, or 5.

The detection layer is actually the easy part

Here's something counterintuitive: building a detector with high F1 score is not the hardest problem in PII-safe LLM pipelines. It's an important problem, but it's tractable.

The harder problems are:

Multi-turn conversations: when a user mentions their name in turn 1 and you need to track it through turn 7, you need a session model that accumulates entity mappings across turns
Compliance guarantees: healthcare deployments need a BAA, not just a model with high recall. HIPAA HITECH doesn't care what your F1 score is.
HIPAA mode: under HIPAA, you can't send data to a third-party NLP service, even for analysis. All inference has to stay local. This is an architectural requirement, not a configuration toggle.
Audit trails: regulated industries need logs of every sanitization operation: what was detected, what was replaced, when, by which customer, with what configuration. Detection alone produces none of this.
Selective purge: once data is in your session store, GDPR Article 17 means you need to be able to delete specific values from all historical sessions on request. A detector doesn't touch your storage layer.

These are the problems that make PII-safe LLM pipelines hard in practice. A high-accuracy detector is the entry ticket, not the finish line.

The right way to think about it

The OpenAI Privacy Filter is a signal that the industry has finally acknowledged that sending raw user data to LLMs is a problem. That acknowledgment matters. The tool is genuinely useful for dataset cleaning, offline processing, and low-stakes applications where one-way redaction is acceptable.

For production LLM pipelines that need to stay coherent, reversible, and compliant, it's a detection component. One layer of a larger system.

We built raipii because we needed the full system, not just the detector. Three detection layers (regex at confidence 1.0, local Presidio/SpaCy NER, and AWS Comprehend for paid tiers). Three replacement modes (token, redact, and fake substitution). A session model for multi-turn conversations. HIPAA mode with local-only inference. Audit logs. A purge API for GDPR compliance.

The OpenAI Privacy Filter will likely become one of those detection layers. But the layer is not the pipeline.

DEV Community