As developers, we often assume that the data users enter into our forms is exactly what we see. But in reality, inputs can be deceiving. Sometimes a user enters a phone number or ID that looks correct, but your validation fails. Why?
A client was entering a valid number (let’s say 51000000
) into a form, but our backend validation kept rejecting it.
- No errors in the logic.
- The input looked fine.
- Manual testing with the same number passed.
Why This Happens
These invisible characters often sneak in when users copy text from:
- Messaging apps (e.g., WhatsApp, Slack).
- Formatted documents (e.g., Word, rich text editors).
Common Types of Invisible Characters
1. Whitespace Characters
-
Regular Space (
U+0020
): The standard space character. -
Tab (
U+0009
): Written as\t
, adds horizontal spacing. -
Newlines:
-
Line Feed (
U+000A
) →\n
-
Carriage Return (
U+000D
) →\r
-
Line Feed (
2. Zero-Width Characters
-
Zero-Width Space (
U+200B
): Doesn’t show up visually but still exists in the text. -
Zero-Width Non-Joiner (
U+200C
): Used in some languages (like Arabic or Persian) to prevent character joining without adding space. -
Zero-Width Joiner (
U+200D
): Used to force characters to join without any visible space.
3. Directional Characters
-
Left-to-Right Mark (LRM) (
U+200E
): Affects text direction but is invisible. -
Right-to-Left Mark (RLM) (
U+200F
): Same as above but for right-to-left languages.
4. Control Characters
-
Soft Hyphen (
U+00AD
): Invisible in most cases, but may show up if the word is broken across lines. -
Non-Breaking Space (
U+00A0
): Looks like a regular space but prevents the line from breaking at that position.
What Laravel Trims by Default
Laravel automatically trims whitespace (spaces, tabs, new lines) from request input when you use the TrimStrings
middleware, which is enabled by default. However, it does not remove invisible Unicode characters like:
- Zero-width spaces
(\u{200B})
- Left-to-right marks
- Other hidden characters
🛠 Solution
This is ideal for removing zero-width characters and directional marks:
$number = "51000000";
$cleanedNumber = preg_replace('/[\p{Cf}]/u', '', $number);
echo $cleanedNumber;
💡 Another Solution
Removes all invisible control characters and whitespace from the input to ensure it's clean and valid.
$cleanedNumber = preg_replace('/[\p{C}\s]+/u', '', $number);
What it do by this code exactly covered in this table:
Type | Unicode Code | Example in $text
|
Notes |
---|---|---|---|
Space | U+0020 |
5 1000000 ← regular space between digits |
Part of \s
|
Tab | U+0009 |
5\t1000000 |
Included in \s
|
Newline |
U+000A , U+000D
|
5\n1000000 or 5\r1000000
|
Included in \s
|
Zero-Width Space | U+200B |
51000000 ← visually invisible |
Matches \p{Cf}
|
Zero-Width Joiner | U+200D |
51000000 ← visually invisible |
Matches \p{Cf}
|
Right-to-Left Mark (RLM) | U+200F |
51000000 ← invisible RTL marker |
Matches \p{Cf}
|
Soft Hyphen | U+00AD |
51000000 ← doesn’t show normally |
Considered a control/formatting character |
Non-Breaking Space (NBSP) | U+00A0 |
5 1000000 ← looks like a space |
Sometimes included in \s , or \p{Zs}
|
Control Character (e.g. BEL) | U+0007 |
5\u{0007}1000000 ← invisible bell char |
Matches \p{C}
|
🧾 Summary
Sometimes, what you don't see in the input is exactly what causes the problem.
It’s a subtle but important reminder to always sanitize and normalize user input before trusting it.
Top comments (0)