I Built a Gmail Spoof Detector That Catches Unicode Homoglyph Phishing

#cybersecurity #infosec #security #showdev

## The attack

I got an email from "Wіх.соm" asking me to verify my account. Looked legit — until I checked the actual sender: info@bistro-pub.de.

The trick? The display name used Cyrillic і (U+0456) and о (U+043E) instead of Latin i and o. They're visually identical in most
fonts. Gmail didn't flag it because the From address wasn't technically spoofed — only the display name was.

This is called homoglyph spoofing, and it works because Unicode includes hundreds of characters that look exactly like Latin letters:

| Character | Looks like | Actually |
|-----------|-----------|----------|
| а (U+0430) | a | Cyrillic |
| с (U+0441) | c | Cyrillic |
| е (U+0435) | e | Cyrillic |
| о (U+043E) | o | Cyrillic |
| р (U+0440) | p | Cyrillic |
| і (U+0456) | i | Ukrainian Cyrillic |
| Ａ (U+FF21) | A | Fullwidth Latin |

## The fix

I built Unspoofer — a Google Apps Script that runs inside your Gmail account and catches these automatically. No browser extension, no
third-party service.

### How it works

1. Normalize homoglyphs

A map of ~80 Cyrillic, Greek, and fullwidth characters gets applied to every sender display name:


js
  const HOMOGLYPH_MAP = {
    '\u0430': 'a', // Cyrillic а
    '\u0441': 'c', // Cyrillic с
    '\u0435': 'e', // Cyrillic е
    '\u043E': 'o', // Cyrillic о
    '\u0456': 'i', // Ukrainian і
    // ... ~80 total mappings
  };

  function normalizeToAscii(str) {
    let result = '';
    for (let i = 0; i < str.length; i++) {
      result += HOMOGLYPH_MAP[str[i]] || str[i];
    }
    return result.toLowerCase();
  }

  So "Wіх.соm" becomes "wix.com".

  2. Match against known brands

  The normalized name is checked against ~50 brand domains (Google, PayPal, Apple, Amazon, banks, shipping companies, etc.). Both full domain matches
   (wix.com) and standalone brand name matches (paypal) are checked, with word-boundary logic to avoid false positives.

  3. Compare domains

  If the display name implies a brand, the script extracts the root domain from the actual sender email and compares:

  - "Wix.com" <noreply@mail.wix.com> — root domains match → legit
  - "Wіх.соm" <info@bistro-pub.de> — wix.com ≠ bistro-pub.de → spoof

  Root domain extraction handles compound TLDs like .co.il and .co.uk correctly.

  4. Flag it

  Spoofed messages get a SPOOF-ALERT Gmail label and a star. The label shows up as a folder in any mail client; the star shows as a flag in Apple
  Mail.

  Architecture

  The whole thing runs as a 15-minute time-driven trigger:

  Trigger (every 15 min)
    → Search: in:inbox newer_than:1d
    → For each message:
        → Check processed-ID cache (skip if seen)
        → Parse From header → display name + email
        → Normalize homoglyphs → check brand list
        → Compare implied domain vs actual domain
        → If mismatch: apply label + star
    → Flush cache to PropertiesService

  The processed-message cache uses a rolling window of 10,000 IDs stored as JSON in ScriptProperties, so messages aren't re-checked across runs. An
  execution time guard breaks the loop before hitting the 6-minute Apps Script limit.

  Try it

  Setup takes about 2 minutes:

  1. Create a new project at https://script.google.com
  2. Copy 5 .gs files from the repo
  3. Run testDetection() to verify (9 test cases)
  4. Run setup() to activate

  GitHub: [https://github.com/yoelf22/unspoofer
]
  Free, open source, MIT licensed. PRs welcome — especially for expanding the homoglyph map or brand list.

DEV Community

I Built a Gmail Spoof Detector That Catches Unicode Homoglyph Phishing

Top comments (0)