Skip to content

DEV Community

Muhammad Arslan

Posted on Feb 24

Why Your Profanity Filter Fails Against Unicode (And How to Fix It)

#javascript #security #typescript #webdev

Most profanity filters only check raw input.

That’s the problem.

You can block fuck.

But what about:

fu\u0441k (Cyrillic “с” instead of Latin “c”)

ｆｕｃｋ (fullwidth Unicode characters)

f.u.c.k (separator bypass)

Fr33 m0ney (leet-speak)

fuuuuck (character stretching)

They all bypass typical word-list filters.

The issue isn’t your regex.
It’s the order of operations.

Normalize First. Validate Second.

Before checking profanity or spam, input should be normalized:

Unicode NFKC normalization
Zero-width character removal
Separator stripping
Homoglyph mapping
Leet-speak normalization
Repetition reduction

After normalization, all evasions collapse into a canonical form.
Then your profanity/spam logic actually works.

What I Built

I created @marslanmustafa/input-shield — a zero-dependency TypeScript validation package that:

Detects Unicode homoglyph attacks
Catches leet-based spam
Blocks stretched profanity
Detects gibberish (e.g. asdfghjkl)
Supports Zod integration
Validates HTML email content safely

Example:

import { createValidator } from '@marslanmustafa/input-shield';

const validator = createValidator()
  .field('Message')
  .min(2).max(500)
  .noProfanity()
  .noSpam()
  .noGibberish();

validator.validate('fu\u0441k'); 
// → blocked

Why This Matters

Unicode homoglyph attacks are not edge cases.
They’re easy, invisible, and widely ignored.

If you're validating user input in production, normalization isn’t optional. It’s required.

Links:

GitHub · npm

Top comments (0)

Subscribe