The Problem: PII in Images Is a Different Beast
Structured text PII detection is solved. But images — scanned IDs, contracts, medical forms? That's a pipeline problem.
I built GlobalShield — upload an image/PDF, get back a redacted file or structured JSON with all PII found.
3-Layer Detection
- Layer 1 — Regex: Emails, phones, SSNs, credit cards, IPs, URLs
- Layer 2 — Country Rules: Malaysia NRIC, Singapore NRIC, US SSN, UK NHS, China ID, Japan MyNumber, Taiwan ID, India Aadhaar/PAN, Australia TFN
- Layer 3 — Microsoft Presidio (NER): Person names, locations, organizations
For non-English documents: auto-detects language, re-runs OCR with correct language pack, translates for detection, maps back to original pixels.
Quick Start
import requests
headers = {
"X-RapidAPI-Key": "YOUR_KEY",
"X-RapidAPI-Host": "globalshield.p.rapidapi.com"
}
# Redact an image
with open("scanned_id.png", "rb") as f:
resp = requests.post(
"https://globalshield.p.rapidapi.com/v1/redact",
headers=headers,
files={"file": f}
)
with open("redacted.png", "wb") as out:
out.write(resp.content)
# Detect only (get JSON)
with open("contract.png", "rb") as f:
resp = requests.post(
"https://globalshield.p.rapidapi.com/v1/detect",
headers=headers,
files={"file": f}
)
for entity in resp.json()["entities"]:
print(f"{entity['entity_type']}: {entity['text']}")
How It Compares
| GlobalShield | Presidio | AWS Macie | Google DLP | |
|---|---|---|---|---|
| Image/PDF support | Yes | No (text only) | S3 only | Complex |
| Languages | 20+ | ~10 | English | Multi |
| Country IDs | 9 countries | No | Limited | Limited |
| Cost | $0.002-0.005/call | Server costs | $1+/GB | $1-3/GB |
Zero data retention — GDPR/CCPA compliant by design.
GlobalShield on RapidAPI — Free tier: 50 credits.
Dealing with PII in documents? Share your use case!
Top comments (0)