DEV Community

Katz Ueno
Katz Ueno

Posted on

Search UTF8mb4 text of MySQL in PHP

MySQL has utf8 & utf8mb4 charcter collation.

Although UTF8 is one encoding type, MySQL's (MariaDB) utf8 only allows to store upto 3-byte characters.

Therefore, some new emojis and Japanese and Chinese characters are not able to store it unless you use utf8mb4.

My CTO at concrete5 Japan, Inc and I came up with the solution to export mysqldump and search the txt in PHP.

It was difficult to search the text only by using SQL.

Export MySQL Database with MySQLDump

We used --skip-extended-insert option so that each record has one line

mysqldump -h {$host} -u {$db_username} --password="{$db_password}" --default-character-set=utf8mb4 --single-transaction --skip-extended-insert {$db_name} {$db_tables} > {$SQL_FILENAME}
Enter fullscreen mode Exit fullscreen mode

If you need forgot to change your DB collation to utf8mb4, and need to re-work on the existing data, you may want to mysqldump using --replace --no-create-info options.

mysqldump -h {$host} -u {$db_username} --password="{$db_password}" --default-character-set=utf8mb4 --single-transaction --replace --no-create-info --skip-extended-insert {$db_name} {$db_tables} > {$SQL_FILENAME}
Enter fullscreen mode Exit fullscreen mode

Search UTF8MB4 characters in PHP

Save this as textsearch.php and run it on your local or server.

<?php
$searchRegex = '/[\x{10000}-\x{10FFFF}]/iu';
$matches = [];

$handle = fopen("SQL_FILENAME", "r");
if ($handle) {
    while (!feof($handle)) {
        $buffer = fgets($handle);
        if (preg_match($searchRegex, $buffer)) {
            $matches[] = $buffer;
        }
        /*
        // If you want to print out which characters are really UTF8MB4, use this pref_match_all instead.
        if (preg_match_all($searchRegex, $buffer, $matchesChar)) {
            echo "Character(s): " . implode(',', $matchesChar[0]) . "\n";
            print_r($buffer);
        }
        */
    }
    fclose($handle);
} else {
    echo "nope! cant open it\n";
}
Enter fullscreen mode Exit fullscreen mode
$ php textsearch.php > {$result.sql}
Enter fullscreen mode Exit fullscreen mode

It will filter you the utf8mb4 SQL.

Tested with PHP 7.4.20 with mbstring extension.

Top comments (0)