I Built a PII Detection API So You Don't Have to Parse Aadhaar Numbers with Regex at 2am

The Problem: PII in Images Is a Different Beast

Structured text PII detection is solved. But images — scanned IDs, contracts, medical forms? That's a pipeline problem.

I built GlobalShield — upload an image/PDF, get back a redacted file or structured JSON with all PII found.

3-Layer Detection

Layer 1 — Regex: Emails, phones, SSNs, credit cards, IPs, URLs
Layer 2 — Country Rules: Malaysia NRIC, Singapore NRIC, US SSN, UK NHS, China ID, Japan MyNumber, Taiwan ID, India Aadhaar/PAN, Australia TFN
Layer 3 — Microsoft Presidio (NER): Person names, locations, organizations

For non-English documents: auto-detects language, re-runs OCR with correct language pack, translates for detection, maps back to original pixels.

Quick Start

import requests

headers = {
    "X-RapidAPI-Key": "YOUR_KEY",
    "X-RapidAPI-Host": "globalshield.p.rapidapi.com"
}

# Redact an image
with open("scanned_id.png", "rb") as f:
    resp = requests.post(
        "https://globalshield.p.rapidapi.com/v1/redact",
        headers=headers,
        files={"file": f}
    )

with open("redacted.png", "wb") as out:
    out.write(resp.content)

# Detect only (get JSON)
with open("contract.png", "rb") as f:
    resp = requests.post(
        "https://globalshield.p.rapidapi.com/v1/detect",
        headers=headers,
        files={"file": f}
    )
for entity in resp.json()["entities"]:
    print(f"{entity['entity_type']}: {entity['text']}")

How It Compares

	GlobalShield	Presidio	AWS Macie	Google DLP
Image/PDF support	Yes	No (text only)	S3 only	Complex
Languages	20+	~10	English	Multi
Country IDs	9 countries	No	Limited	Limited
Cost	$0.002-0.005/call	Server costs	$1+/GB	$1-3/GB