Most profanity filters only check raw input.
That’s the problem.
You can block fuck.
But what about:
fu\u0441k (Cyrillic “с” instead of Latin “c”)
fuck (fullwidth Unicode characters)
f.u.c.k (separator bypass)
Fr33 m0ney (leet-speak)
fuuuuck (character stretching)
They all bypass typical word-list filters.
The issue isn’t your regex.
It’s the order of operations.
Normalize First. Validate Second.
Before checking profanity or spam, input should be normalized:
- Unicode NFKC normalization
- Zero-width character removal
- Separator stripping
- Homoglyph mapping
- Leet-speak normalization
- Repetition reduction
After normalization, all evasions collapse into a canonical form.
Then your profanity/spam logic actually works.
What I Built
I created @marslanmustafa/input-shield — a zero-dependency TypeScript validation package that:
- Detects Unicode homoglyph attacks
- Catches leet-based spam
- Blocks stretched profanity
- Detects gibberish (e.g. asdfghjkl)
- Supports Zod integration
- Validates HTML email content safely
Example:
import { createValidator } from '@marslanmustafa/input-shield';
const validator = createValidator()
.field('Message')
.min(2).max(500)
.noProfanity()
.noSpam()
.noGibberish();
validator.validate('fu\u0441k');
// → blocked
Why This Matters
Unicode homoglyph attacks are not edge cases.
They’re easy, invisible, and widely ignored.
If you're validating user input in production, normalization isn’t optional. It’s required.
Top comments (0)