DEV Community

Dave Sng
Dave Sng

Posted on

I Built a PII Detection API So You Don't Have to Parse Aadhaar Numbers with Regex at 2am

The Problem: PII in Images Is a Different Beast

Structured text PII detection is solved. But images — scanned IDs, contracts, medical forms? That's a pipeline problem.

I built GlobalShield — upload an image/PDF, get back a redacted file or structured JSON with all PII found.

3-Layer Detection

  • Layer 1 — Regex: Emails, phones, SSNs, credit cards, IPs, URLs
  • Layer 2 — Country Rules: Malaysia NRIC, Singapore NRIC, US SSN, UK NHS, China ID, Japan MyNumber, Taiwan ID, India Aadhaar/PAN, Australia TFN
  • Layer 3 — Microsoft Presidio (NER): Person names, locations, organizations

For non-English documents: auto-detects language, re-runs OCR with correct language pack, translates for detection, maps back to original pixels.

Quick Start

import requests

headers = {
    "X-RapidAPI-Key": "YOUR_KEY",
    "X-RapidAPI-Host": "globalshield.p.rapidapi.com"
}

# Redact an image
with open("scanned_id.png", "rb") as f:
    resp = requests.post(
        "https://globalshield.p.rapidapi.com/v1/redact",
        headers=headers,
        files={"file": f}
    )

with open("redacted.png", "wb") as out:
    out.write(resp.content)

# Detect only (get JSON)
with open("contract.png", "rb") as f:
    resp = requests.post(
        "https://globalshield.p.rapidapi.com/v1/detect",
        headers=headers,
        files={"file": f}
    )
for entity in resp.json()["entities"]:
    print(f"{entity['entity_type']}: {entity['text']}")
Enter fullscreen mode Exit fullscreen mode

How It Compares

GlobalShield Presidio AWS Macie Google DLP
Image/PDF support Yes No (text only) S3 only Complex
Languages 20+ ~10 English Multi
Country IDs 9 countries No Limited Limited
Cost $0.002-0.005/call Server costs $1+/GB $1-3/GB

Zero data retention — GDPR/CCPA compliant by design.

GlobalShield on RapidAPI — Free tier: 50 credits.


Dealing with PII in documents? Share your use case!

Top comments (0)