DEV Community

Mark Railton
Mark Railton

Posted on • Originally published at markrailton.com

1 1

Comparing strings that may or may not contain diacritics in PHP

Today I ran into something that really had me scratching my head, I had to compare a string from a form against a string from the database. Clearly that's not where the issue was as it's a pretty simple thing in PHP, what had me scratching my head was that I needed to account for diacritics possibly being in 1 string but not in the other.

I spent quite some time looking online but eventually took to twitter and asked the wondrous PHP community for help

Ok, taking a complete blank and need some #php help. Need to compare 2 strings that may or may not contain diacritics. Example, Seán matches Sean. Don't know why I can't figure this one out, anyone any ideas?

— Mark Railton (@railto) August 6, 2021

Within minutes I had a couple of people offering suggestions and health conversation ensued. I settled on a solution by Derick Rethans That uses the Collator class from the intl extension. I took the example provided by Derick and tweaked it just a bit to suit how I wanted it, snippet of which is below

$c = new Collator( 'en' );
$c->setStrength( Collator::PRIMARY );

if ($c->getSortKey($newUser['firstname']) !== $c->getSortKey($existingUser->firstname)) {
    return null;
}
Enter fullscreen mode Exit fullscreen mode

To give a bit more context on this, Let's say we have a user called Sean. Sometimes people called Sean will spell it Sean but others may spell it Seán with the Irish diacritic Fada. Both of these people are called Sean and both spellings are seen as correct, however when doing a direct comparison in PHP (or any other language really) you'll end up getting a mismatch if you try using the equals operator. For the task I've been working on, it was important that we allow for the same person possibly having the Fada in their name in the database, but then maybe not entering it another time in a different form.

Thanks to Derick, Ben and the others that posted possible solutions on the twitter thread. It really helped and thankfully I was able to move on with the task.

Image of Timescale

🚀 pgai Vectorizer: SQLAlchemy and LiteLLM Make Vector Search Simple

We built pgai Vectorizer to simplify embedding management for AI applications—without needing a separate database or complex infrastructure. Since launch, developers have created over 3,000 vectorizers on Timescale Cloud, with many more self-hosted.

Read full post →

Top comments (0)

👋 Kindness is contagious

Dive into an ocean of knowledge with this thought-provoking post, revered deeply within the supportive DEV Community. Developers of all levels are welcome to join and enhance our collective intelligence.

Saying a simple "thank you" can brighten someone's day. Share your gratitude in the comments below!

On DEV, sharing ideas eases our path and fortifies our community connections. Found this helpful? Sending a quick thanks to the author can be profoundly valued.

Okay