Redaction is Not Enough: When an LLM can still Infer the PII You Stripped Out

#privacy #ai #security #machinelearning

A reader left a sharp question on my last post about redacting PII before sending prompts to an LLM. Paraphrased: If you redact "John from ACME" to "[NAME] from [COMPANY]", can the model still infer who it is?

Yes. And it is worth pulling that apart, because it points to a real limit that a lot of teams miss. Redaction handles the identifiers it can see. It does not stop a model from reasoning about everything you left in.

Two problems people treat as one
When people say "keep PII out of my prompts," they are usually worried about two different things:

Direct personal data sitting verbatim in a third party's logs. The literal email, phone number, SSN, or card number ends up in OpenAI's or Anthropic's systems, outside your control. This is the concrete GDPR and HIPAA exposure for most teams.
The model inferring or re-identifying someone from what is left, even after the obvious identifiers are gone.
Redaction solves the first problem completely. Swap the direct identifiers for placeholders, and they never appear verbatim in logs you do not own. For a large share of real compliance work, that is the whole job, and it is cheap to do.

The second problem is different and harder. It is not really a redaction problem at all.

How inference leakage actually happens
Three common ways the data you stripped comes back:

Quasi-identifiers. No single field identifies a person, but a combination does. Latanya Sweeney's well-known finding is that ZIP code, date of birth, and sex uniquely identify roughly 87 percent of Americans. Strip the name and email, leave "a 41-year-old cardiologist in Boise who joined in March," and you may have described exactly one person.

Context that gives away the blank. Placeholders do not help if the surrounding text answers the question for the model. "[NAME] runs the electric vehicle company in Austin" leaves very little to guess. The model fills in the blank because you told it how.

Rare details. An unusual job title, a specific dollar amount, a distinctive phrasing lifted from a support ticket. The more unique the detail, the more it points at one person.

What to actually do about it
This is an architecture decision, not a setting you flip. A few practical layers:

Minimize the context, not just the identifiers. Send the model only what the task needs. If you are summarizing a complaint, it probably does not need the customer's age, employer, and city. Data minimization is the highest-leverage move and the one most people skip.

Generalize the quasi-identifiers you do need. When the task allows, bucket the specifics. Exact age becomes an age range. A city becomes a region. "Cardiologist in Boise" becomes "a physician." You keep what the model needs to do the work and drop what re-identifies.

Match the protection to the threat. If your only goal is keeping direct PII out of logs you do not control, redaction is enough and you can stop there. If the content itself is sensitive enough that inference matters, that is a deliberate design choice, and you handle it above the redaction layer.

For the strictest cases, keep it off shared models. When inference is genuinely unacceptable, the clean answer is a self-hosted or local model. If the text never reaches a third party, there is nothing to infer in someone else's system.

Restore on your side and check the output. The model can reintroduce details or guess at them. Doing the restore step yourself, and reviewing what comes back before it reaches a user keeps that contained.

Where redaction fits
Redaction is the necessary first layer. It guarantees the explicit email, phone, SSN, card, and name never sit verbatim in logs you do not control, which is the concrete compliance win and the easiest to ship. Inference protection is a second layer you reach for based on how sensitive the content is. Most teams need the first. Regulated, high-sensitivity workloads need both.

*To be clear about my own tool: the API I built does the first layer. It detects and tokenizes direct identifiers fast and in-process, and restores them in the reply. It does not claim to stop inference, because that lives in how you design the prompt and what context you choose to send. Anyone who tells you a redaction step alone makes an LLM "private" is selling you something.
*

LLM Privacy Shield on RapidAPI if you want the first layer handled. The second layer is on you, and now you know what it is.

Wrapping up
Redaction is necessary, not sufficient. Strip the direct identifiers so they stay out of logs you do not own. Minimize the context so there is less to infer from. Generalize the quasi-identifiers you have to keep. And for the most sensitive work, keep it off shared models entirely. Redaction is the floor, not the ceiling.

If you missed the first post on the redact then restore pattern, it is here: How to redact PII before sending prompts to OpenAI, Claude, or Gemini.

DEV Community

Redaction is Not Enough: When an LLM can still Infer the PII You Stripped Out

Top comments (0)