DEV Community

g.okc
g.okc

Posted on

How to detect and remove PII from any text payload in Python

PII leaking into logs, LLM prompts, and audit trails is one of the most common and costly compliance failures.
In this post I'll show you how to detect and strip PII from any text payload in Python — names, emails, SSN, CPF, credit cards — using a production REST API built in Rust with sub-15ms latency.

The problem

Most teams realize PII is leaking too late — after a breach, after an audit, or after the data lands in an LLM training set.

The solution

One API call before your data touches anything sensitive:

import requests

def anonymize(text: str, api_key: str) -> dict:
    response = requests.post(
        "https://vortex-dfs.onrender.com/v1/shield/anonymize",
        headers={
            "Authorization": f"Bearer {api_key}",
            "Content-Type": "application/json"
        },
        json={"content": text}
    )
    return response.json()

# Example
result = anonymize(
    "John Smith, SSN 123-45-6789, card 4111-1111-1111-1111",
    api_key="your_key_here"
)

print(result["sanitized"])
# → "[NAME] [SSN] [CARD]"

print(result["risk_score"])
# → 0.94

print(result["latency_ms"])
# → 12.3
Enter fullscreen mode Exit fullscreen mode

Integrate before your LLM pipeline****

def safe_llm_call(ticket_content: str, api_key: str) -> str:
    # Step 1 — strip PII
    clean = anonymize(ticket_content, api_key)

    # Step 2 — safe to send now
    prompt = f"Summarize this support ticket: {clean['sanitized']}"
    response = openai.chat.completions.create(
        model="gpt-4",
        messages=[{"role": "user", "content": prompt}]
    )
    return response.choices[0].message.content


Enter fullscreen mode Exit fullscreen mode

What gets detected

You can. But:

  • Regex misses context — 123-45-6789 alone vs inside a sentence
  • No risk scoring — you don't know how sensitive the payload is
  • No token map — you can't reverse the anonymization if needed
  • Maintenance burden — every new pattern is a new regex

The API handles all of this and returns an encrypted token map if you need to deanonymize later.

Get your API key

Starts at $9/week. Key delivered instantly after payment.
Here👉

Top comments (0)