DEV Community

Yasser Elgammal
Yasser Elgammal

Posted on

Handling Invisible characters with PHP

As developers, we often assume that the data users enter into our forms is exactly what we see. But in reality, inputs can be deceiving. Sometimes a user enters a phone number or ID that looks correct, but your validation fails. Why?

A client was entering a valid number (let’s say 5​1000000) into a form, but our backend validation kept rejecting it.

  • No errors in the logic.
  • The input looked fine.
  • Manual testing with the same number passed.

Why This Happens

These invisible characters often sneak in when users copy text from:

  • Messaging apps (e.g., WhatsApp, Slack).
  • Formatted documents (e.g., Word, rich text editors).

Common Types of Invisible Characters

1. Whitespace Characters

  • Regular Space (U+0020): The standard space character.
  • Tab (U+0009): Written as \t, adds horizontal spacing.
  • Newlines:
    • Line Feed (U+000A)\n
    • Carriage Return (U+000D)\r

2. Zero-Width Characters

  • Zero-Width Space (U+200B): Doesn’t show up visually but still exists in the text.
  • Zero-Width Non-Joiner (U+200C): Used in some languages (like Arabic or Persian) to prevent character joining without adding space.
  • Zero-Width Joiner (U+200D): Used to force characters to join without any visible space.

3. Directional Characters

  • Left-to-Right Mark (LRM) (U+200E): Affects text direction but is invisible.
  • Right-to-Left Mark (RLM) (U+200F): Same as above but for right-to-left languages.

4. Control Characters

  • Soft Hyphen (U+00AD): Invisible in most cases, but may show up if the word is broken across lines.
  • Non-Breaking Space (U+00A0): Looks like a regular space but prevents the line from breaking at that position.

What Laravel Trims by Default

Laravel automatically trims whitespace (spaces, tabs, new lines) from request input when you use the TrimStrings middleware, which is enabled by default. However, it does not remove invisible Unicode characters like:

  • Zero-width spaces (\u{200B})
  • Left-to-right marks
  • Other hidden characters

🛠 Solution

This is ideal for removing zero-width characters and directional marks:

$number = "5​1000000";
$cleanedNumber = preg_replace('/[\p{Cf}]/u', '', $number);
echo $cleanedNumber;
Enter fullscreen mode Exit fullscreen mode

💡 Another Solution

Removes all invisible control characters and whitespace from the input to ensure it's clean and valid.

$cleanedNumber = preg_replace('/[\p{C}\s]+/u', '', $number);
Enter fullscreen mode Exit fullscreen mode

What it do by this code exactly covered in this table:

Type Unicode Code Example in $text Notes
Space U+0020 5 1000000 ← regular space between digits Part of \s
Tab U+0009 5\t1000000 Included in \s
Newline U+000A, U+000D 5\n1000000 or 5\r1000000 Included in \s
Zero-Width Space U+200B 5​1000000 ← visually invisible Matches \p{Cf}
Zero-Width Joiner U+200D 5‍1000000 ← visually invisible Matches \p{Cf}
Right-to-Left Mark (RLM) U+200F 5‏1000000 ← invisible RTL marker Matches \p{Cf}
Soft Hyphen U+00AD 5­1000000 ← doesn’t show normally Considered a control/formatting character
Non-Breaking Space (NBSP) U+00A0 5 1000000 ← looks like a space Sometimes included in \s, or \p{Zs}
Control Character (e.g. BEL) U+0007 5\u{0007}1000000 ← invisible bell char Matches \p{C}

🧾 Summary

Sometimes, what you don't see in the input is exactly what causes the problem.

It’s a subtle but important reminder to always sanitize and normalize user input before trusting it.

Heroku

Built for developers, by developers.

Whether you're building a simple prototype or a business-critical product, Heroku's fully-managed platform gives you the simplest path to delivering apps quickly — using the tools and languages you already love!

Learn More

Top comments (0)

Sentry image

See why 4M developers consider Sentry, “not bad.”

Fixing code doesn’t have to be the worst part of your day. Learn how Sentry can help.

Learn more

👋 Kindness is contagious

DEV is better (more customized, reading settings like dark mode etc) when you're signed in!

Okay