DEV Community

Discussion on: Converting UTF-8 strings to ASCII using the ICU Transliterator

Collapse
 
lito profile image
Lito

Do you use Laravel? How about performance Transliterator vs str_slug? And convert string results? Thanks!

Collapse
 
lito profile image
Lito

Here the test, 10.000 iterations over 2 strings:

$string1 = '<?php François😎: _+ / Стравинский`😜.';
$string2 = 'Daniël Renée François Bjørn in’t Veld';

$time = microtime(true);

for ($i = 0; $i < 10000; $i++) {
    slugify($string1);
    slugify($string2);
}

echo 'slugify: '.round(microtime(true) - $time, 3).' seconds - '.slugify($string1).' - '.slugify($string2)."\n";

$time = microtime(true);

for ($i = 0; $i < 10000; $i++) {
    str_slug($string1);
    str_slug($string2);
}

echo 'str_slug: '.round(microtime(true) - $time, 3).' seconds - '.str_slug($string1).' - '.str_slug($string2)."\n";
Enter fullscreen mode Exit fullscreen mode

And results:

slugify: 12.817 seconds - php-francois-stravinskij - daniel-renee-francois-bjorn-int-veld
str_slug: 0.151 seconds - php-francois-stravinskii - daniel-renee-francois-bjorn-int-veld
Enter fullscreen mode Exit fullscreen mode

Laravel str_slug function has a great performance, but result is not same.

Collapse
 
bartvanraaij profile image
Bart van Raaij

That’s a great question Lito — which you’ve answered yourself :-)
Because the PHP Transliterator is a wrapper for the native ICU lib in C, I’m not surprised it performs a lot worse than Laravel’s native php str_slug.

I’ll take a look at Laravel’s implementation tomorrow. 👍🏻 Very curious how they do it.

Thread Thread
 
lito profile image
Lito

For me, all related with performance is always a MUST. I work with a lot of data and I always need a efficient solution for every problem :)

Thread Thread
 
bartvanraaij profile image
Bart van Raaij • Edited

I've taken a look at Laravel's str_slug. It uses voku/helper/ASCII::to_ascii under the hood.
That lib and function uses a quite clever in-memory cache on runtime, in which every character is cached in an array:
github.com/voku/portable-ascii/blo...
So subsequent transforms are much faster because they don't need to be transformed again.
This is of course highly beneficial to the performance.

The output difference between my slugify() and voku's to_ascii is explained by the fact that the latter takes a locale into account (English by default).

That being said: my "bonus tip" slugify example was never meant to be production code. It's just another example of what the ICU Transliterator can do. Of course there are other libs out there that do the same kind of stuff, which are perhaps better/faster at doing so; because there's a lot of development in them.
I hope you liked my article anyway, even if it's not directly usable for you. 🤞🏻

Thread Thread
 
lito profile image
Lito

Oh! caches 😅

Your article is great! and is perfect as the subject say, to understand how UTF-8 and ASCII converion works.