How I anonymize sensitive data before sharing with AI

#privacy #security #ai #webdev

Like most devs, I use ChatGPT and Claude daily. But I work with configs, logs, and infrastructure code full of real IPs, API keys, database passwords, and customer identifiers. I can't just paste that stuff in.

Manually redacting things was tedious and I'd always miss something. So I built Privatiser — a tool that automatically detects and replaces sensitive data with consistent pseudonyms, and lets you reverse it when you get the AI's response back.

What it does

You paste in something like:

server_ip = "192.168.1.100"
DB_PASSWORD = "super_s3cret_passw0rd!"
API_KEY = "sk-ant-abcdefghijklmnopqrstuv"
admin_email = "jane.doe@acme-corp.com"
ssn = "123-45-6789"

And it becomes:

server_ip = "10.0.0.1"
DB_PASSWORD = "REDACTED_SECRET_1"
API_KEY = REDACTED_SECRET_2
admin_email = "user-1@redacted.example.net"
ssn = "078-05-0001"

The pseudonyms are consistent — if the same IP appears 10 times, it gets the same replacement everywhere. So the AI can still reason about relationships in your text without seeing any real values.

When you get the AI's response, paste it into the Deanonymize tab with the mapping and everything gets restored.

What it catches

It's not just simple find-and-replace. It uses pattern-based detection across a bunch of categories:

Secrets — API keys (AWS, OpenAI, GitHub, Slack, Anthropic), JWTs, bearer tokens, SSH keys, PEM keys, connection strings, and 100+ keyword patterns like password, api_key, token, client_secret, etc.
Network — IPv4 addresses (with CIDR), domains, emails, MAC addresses
PII — Phone numbers, credit cards (Luhn-validated), SSNs, passports, IBANs
Cloud — AWS ARNs (structure preserved), account IDs, S3 buckets, Azure subscription IDs, GCP project IDs
Identifiers — UUIDs, plus 200+ keyword-based patterns for hostnames, usernames, database names, infrastructure names, endpoints, file paths, and more

It also understands natural language context. Not just password = "value" but also password is secret123, token set to abc123, credentials were admin:pass, etc.

How I built it

The core is a regex-based pattern engine that processes text in two phases.

Patterns are sorted by priority — specific patterns like API key formats run before generic keyword-based patterns. This prevents a broad pattern from stealing a match that a more specific one should handle.

There are validators to reduce false positives: Luhn algorithm for credit cards, SSN area number validation, skip lists for well-known cloud domains (so s3.amazonaws.com doesn't get redacted as a hostname).

Everything runs locally in your browser. No servers, no API calls, no telemetry.

Try it

Web tool: privatiser.net — paste text and try it right in your browser
Chrome extension: Chrome Web Store — auto-anonymizes when you paste into ChatGPT, Claude, Gemini, or Copilot
Firefox extension: Firefox Add-ons — same thing for Firefox (currently in review)

The browser extension intercepts paste events on AI chat sites, anonymizes the content, and shows a toast with how many items were redacted. When you copy the AI's response, it automatically restores the original values.

What's next

I'm looking at adding category toggles (so you can disable PII detection if you only care about secrets), an allowlist for values you want to skip, and better handling of .env files.

Would love to hear what patterns you'd want added or if there's something it misses for your use case.

Top comments (1)

cellurl • Feb 17

i wondered how people achieved this.