Article originated from https://medium.com/@hafiqiqmal93/normalizing-fancy-text-to-normal-text-in-laravel-7d9ed56d5a78
Text input from users are not at all interesting. With the advent of Unicode in the smartphones, users now have the luxury (and sometimes the whimsy) to input text in a variety of styles and formats. From emojis to diacritics, ligatures to full-width characters, the range of โfancy textโ can be extremely confusing or difficult to understand by the system. While visually appealing, these text variations pose a significant challenge for the system particularly in terms of data consistency, searchability, and user experience.
Here are the example of fancy text:-
๐๐ฆ๐ช๐จ๐ฉ๐ฃ๐ฐ๐ณ ๐ฎ๐ข๐ฌ๐ฆ ๐ข ๐ฏ๐ฆ๐ธ ๐ค๐ฐ๐ฏ๐ฏ๐ฆ๐ค๐ต๐ช๐ฐ๐ฏ ๐ถ๐ฏ๐ฅ๐ฆ๐ณ ๐ต๐ฐ ๐ฐ๐ถ๐ณ ๐ฎ๐ฆ๐ต๐ฆ๐ณ ๐ข๐ฏ๐ฅ ๐ธ๐ฆ ๐ฅ๐ช๐ด๐ค๐ฐ๐ท๐ฆ๐ณ๐ฆ๐ฅ ๐ช๐ต ๐ฃ๐ฆ๐ค๐ข๐ถ๐ด๐ฆ ๐ต๐ฉ๐ฆ๐บ ๐ด๐ธ๐ช๐ต๐ค๐ฉ ๐ฐ๐ง๐ง ๐ต๐ฉ๐ฆ ๐ฎ๐ข๐ช๐ฏ ๐ฎ๐ฆ๐ต๐ฆ๐ณ ๐ข๐ฏ๐ฅ ๐ช ๐จ๐ฐ ๐ฅ๐ฐ๐ธ๐ฏ ๐ต๐ฐ ๐ค๐ฉ๐ฆ๐ค๐ฌ ๐ข๐ฏ๐ฅ ๐ด๐ฐ๐ฎ๐ฆ๐ฐ๐ฏ๐ฆ ๐จ๐ฐ ๐ฅ๐ฐ๐ธ๐ฏ ๐ข๐ญ๐ด๐ฐ ๐ต๐ฐ ๐ฐ๐ง๐ง ๐ช๐ต ๐ข๐จ๐ข๐ช๐ฏ ๐ข๐ฏ๐ฅ ๐ค๐ญ๐ข๐ช๐ฎ๐ช๐ฏ๐จ ๐ต๐ฉ๐ข๐ต๐ช๐ด ๐ต๐ฉ๐ฆ๐ช๐ณ ๐ฎ๐ฆ๐ต๐ฆ๐ณ, ๐ช๐ต ๐ฐ๐ฏ๐ญ๐บ ๐ฉ๐ข๐ฑ๐ฑ๐ฆ๐ฏ๐ด ๐ต๐ฉ๐ช๐ด ๐ธ๐ฆ๐ฆ๐ฌ..๐ฏ๐ฆ๐ท๐ธ๐ณ ๐ช๐ฏ ๐ต๐ฉ๐ฆ ๐ฑ๐ข๐ด๐ต. ๐๐ฉ๐ฆ ๐บ๐ฆ๐ญ๐ญ๐ฐ๐ธ ๐ฉ๐ฐ๐ด๐ฆ ๐ช๐ด ๐ซ๐ถ๐ด๐ต ๐ฏ๐ฆ๐ธ๐ญ๐บ ๐ค๐ฐ๐ฏ๐ฏ๐ฆ๐ค๐ต๐ฆ๐ฅ
Looks like italic character but its not italic. Its actually belongs to Mathematical Alphanumeric Symbols.
Problem in PHP ๐ฅ
Well, a very obvious problem is that PHP can't JSON encode deformed UTF-8 characters upon receipt. In the modern way of doing web development, where APIs and frontend frameworks use JSON to transport data, this is a problem. If treated wrong, such deformed characters will result in data corruption, crash, or angry users.
Our goal is simple :- came out with the solution that will convert every fancy text into normal readable text.
PHP Normalizer
Normalization forms are pivotal to understanding the normalization process. They cater to different linguistic and technical needs. For instance, the NFC form combines characters into their composed forms, whereas NFD does the opposite, decomposing composed characters into their constituent parts. NFKC and NFKD forms go further, considering compatibility charactersโ-โfolding variations of characters into a canonical form. These forms ensure that text comparison, searching, and storage are consistent and reliable.
The Solutionย ๐
The code snippet provided is a sterling example of PHP approach to solving complex problems with simplicity and efficiency. Let's dissect this solution, understand its components, and see how it seamlessly integratesย :-
public static function normalizeText($text): ?string
{
if (!$text) {
return null;
}
$intl = [
\Normalizer::FORM_C,
\Normalizer::FORM_D,
\Normalizer::NFD,
\Normalizer::FORM_KC,
\Normalizer::NFKC,
\Normalizer::FORM_KC_CF,
\Normalizer::FORM_KD,
\Normalizer::NFKD,
\Normalizer::NFC,
\Normalizer::NFKC_CF,
];
foreach ($intl as $form) {
if (!\Normalizer::isNormalized($text, $form)) {
return \Normalizer::normalize($text, $form);
}
}
return $text;
}
The usage is simple:-
$normalText = Utils::normalizeText($YOUR_FANCY_STRING)
You may register inside helper function to make it easier to use. For example:-
if ( ! function_exists('normalize_text')) {
function normalize_text(string $text): string
{
return Utils::normalizeText($text)
}
}
// USAGE
$normalText = normalize_text($YOUR_FANCY_STRING)
At its core, this function leverages PHP's **Normalizer**
class-a part of the Internationalization (intl) extension-to address the normalization. The **Normalizer**
class offers several normalization forms, each tailored to different normalization needs. This function iterates through these forms, checking if the text is already normalized in a given form using **isNormalized**
function. If not, it normalizes the text to that form and returns the normalized string.
Conclusion
While fancy text may add visual appeal to user input, it poses significant challenges for data processing and system interoperability. However, with the adoption of PHP's Normalizer class and the implementation of normalization forms, developers can overcome these challenges and ensure that their applications maintain data consistency and reliability in the face of diverse text inputs.
Do you have any experiences or challenges related to handling fancy text in your projects? How do you currently address such issues, and do you find PHP's Normalizer class useful in your workflow? Let's continue the conversation and share our insights to help each other navigate the complexities of modern web development. ๐ค๐ผ
Top comments (0)