DEV Community

I Built a PII Detection API with Zero AI Cost (Pure Regex)

Most PII detection tools charge per API call because they run your text through an LLM. But for detecting structured patterns like emails, phone numbers, and credit cards, you don't need AI at all.

I built Origrid PII Detect -- a PII scanning API that uses pure regex pattern matching. Zero LLM calls, zero AI cost, sub-500ms response times.

The problem

If you're building any app that handles user text (forms, comments, chat, logs), you probably need to check for accidentally exposed personal data before storing or forwarding it. GDPR requires it. Common sense demands it.

The existing options are:

  • Microsoft Presidio -- powerful but requires self-hosting a full NLP pipeline
  • AWS Comprehend -- great but $0.01+ per request adds up fast
  • Google DLP -- enterprise pricing, enterprise complexity

For most use cases, you don't need NLP. Emails look like emails. Phone numbers look like phone numbers. Credit cards follow the Luhn algorithm.

The approach: regex with smart deduplication

The API detects 6 entity types using pre-compiled regex patterns:

Entity How it's detected
Email RFC 5322 simplified pattern
Phone International formats (US, EU, UK, LATAM)
Credit card Visa/MC/Amex/Discover patterns + Luhn validation
SSN US format XXX-XX-XXXX with range validation
IBAN European format with country code prefix
IP address IPv4 with octet range validation

Luhn validation for credit cards

This is the key differentiator from naive regex. A pattern like 4111-1111-1111-1111 matches the Visa format, but we also run Luhn's algorithm to verify it's a mathematically valid card number:

def _luhn_check(number: str) -> bool:
    digits = [int(d) for d in number if d.isdigit()]
    if len(digits) < 13:
        return False
    checksum = 0
    for i, d in enumerate(reversed(digits)):
        if i % 2 == 1:
            d *= 2
            if d > 9:
                d -= 9
        checksum += d
    return checksum % 10 == 0
Enter fullscreen mode Exit fullscreen mode

This eliminates false positives from random number sequences that happen to match card formats.

Smart deduplication

When patterns overlap (e.g., a phone number inside an IBAN), the API deduplicates by priority. Credit cards and SSNs have highest priority since they're the most sensitive.

What you get back

Example response:

{
  "pii_found": true,
  "entity_count": 3,
  "entities": [
    {"type": "email", "value": "john@test.com", "start": 6, "end": 19, "confidence": 1.0},
    {"type": "phone", "value": "+34 612 345 678", "start": 26, "end": 41, "confidence": 1.0},
    {"type": "credit_card", "value": "4111-1111-1111-1111", "start": 48, "end": 67, "confidence": 1.0}
  ],
  "redacted_text": "Email [EMAIL], call [PHONE], card [CREDIT_CARD]",
  "risk_level": "high"
}
Enter fullscreen mode Exit fullscreen mode

Key features:

  • Exact positions (start/end) so you can highlight or mask in your UI
  • Redacted text ready to store safely
  • Risk level (high = credit cards or SSNs found)

Performance

Because there's no AI model in the loop:

  • Latency: ~100-400ms (network overhead, not compute)
  • Cost per call: $0.00 (no LLM tokens)
  • Reliability: deterministic -- same input always produces same output

When you DO need AI

Regex won't catch:

  • Names (without a dictionary)
  • Street addresses (too many formats)
  • Context-dependent PII ("my birthday is next Thursday")

For those, you need an LLM layer. I'm planning a "deep scan" mode for v2 that adds LLM analysis on top of regex. But for 80% of compliance use cases, regex covers what you need.

Try it free

The API is live on RapidAPI with a free tier (50 requests/month):

Origrid PII Detect on RapidAPI

import requests

response = requests.post(
    "https://origrid-pii-detect.p.rapidapi.com/v1/pii/scan",
    headers={
        "X-RapidAPI-Key": "YOUR_KEY",
        "Content-Type": "application/json"
    },
    json={"text": "Contact sarah@company.com or 555-123-4567"}
)

data = response.json()
print(data["redacted_text"])
# "Contact [EMAIL] or [PHONE]"
Enter fullscreen mode Exit fullscreen mode

Built with FastAPI. Full OpenAPI docs available on the RapidAPI listing.


What PII patterns would you add? I'm considering passport numbers and driver's license formats for v2. Let me know in the comments.

Top comments (0)