DEV Community

Cover image for String Operations On Phone Numbers In SQL
Arctype Team for Arctype

Posted on • Originally published at arctype.com

String Operations On Phone Numbers In SQL

When dealing with SQL databases, users will have to interact with different data types and schemas depending on the project's requirements. Since databases are used in many different applications and support multiple platforms, devices, and users, there may be differences in how data is entered, even in a simple field like a phone number.

For example, one set of users will enter the phone number with the country code, while the others may simply ignore the country code. Therefore, it is essential for database users to know how to clean data in a SQL database. In this post, we will see how to utilize SQL string operations on phone number fields to properly clean, format, and extract data.

Why do we care about data cleaning?

Data comes in all forms and sizes. Most of the time, it will be raw data, without proper formatting. Data cleaning allows users to filter, sort, and format this raw data to extract meaningful information from it. Data cleaning also makes data analysis a much more efficient process as an additional benefit.

Let's assume we have a text field that captures user information and asks users to enter valuable information like phone numbers. We need to clean this text field to make the phone numbers usable. How would we go about doing that? To begin with, we need to understand SQL string operations.

What are SQL string operations?

String operations are functions that can be used to manipulate strings. These string operators (also called string functions) take a string as input, process it according to the specified function, and return a string as the output. The availability of these functions will differ depending on the database. For example, the following is a list of commonly used string functions in MySQL:

Common Types of String Operations

Function Usage
LEFT / RIGHT Extract a specified number of characters from a string (starting from left or right)
TRIM Remove leading or trailing spaces
POSITION / STRPOS Analogous to the Location function, where a numerical value of the position of the specified substring is provided.
SUBSTR Extract the specified substring from a specific location to match the specified length
CONCAT Combine two or more strings to make a single string
COALESCE Replace null values with a specified value
FORMAT Formats the value to the specified format
REGEXP Matches string to a regular expression
REGEXP_SUBSTR Extracts value for the matching regular expression
STR Convert numerical value to string
REVERSE Reverses the specified string
REPLACE Replaces the specified string with the given value.

The full list of available string functions and operators is available in the official documentation.

How to use string operations with phone numbers

Now we have a basic understanding of string operations and related functions in SQL! So, let's look at how exactly we can utilize some of these functions on phone numbers to extract and format data. For all the examples provided here, we will be using the Arctype SQL client with a MySQL database in a Windows environment with the following dataset.

Properly format a phone number

We can utilize the SUBSTR command to format phone numbers. It will extract parts of the number and use the CONCAT to join the specific parts together. In the following example, we have broken down the phone number into three distinct sections and combined them together with the formatting as a new field called num_formatted:

SELECT
    phone_num,
    CONCAT(
        '(',
        SUBSTR(phone_num, 1, 3),
        ') ',
        SUBSTR(phone_num, 4, 3),
        '-',
        SUBSTR(phone_num, 7)
    ) AS num_formatted
FROM
    EmpDetails
Enter fullscreen mode Exit fullscreen mode

The result of the operation is show below:
Cleaning phone numbers - output of example.
You can create a new column to save the formatted number using the ALTER TABLE command. It will create a new column and update it with the formatted numbers.

ALTER TABLE EmpDetails
ADD formatted_number VARCHAR(255);

UPDATE
    EmpDetails
SET
    formatted_number = (
        SELECT
            CONCAT(
                '(',
                SUBSTR(phone_num, 1, 3),
                ') ',
                SUBSTR(phone_num, 4, 3),
                '-',
                SUBSTR(phone_num, 7)
            ) AS num_formatted
    )
Enter fullscreen mode Exit fullscreen mode

The result of the above operation is shown below:
Screeshot of a query in Arctype.
Screenshot of data from a query to convert phone numbers to formatted data.
We can modify our SQL command to include the REPLACE function if there are additional characters in the phone number field, such as dashes and brackets. This function can be used to remove unnecessary characters and whitespaces in a string. Now let's further format the formmated_number field to include the country code.

SELECT
    formatted_number,
    CONCAT(
        '+1 '
        '(',
        SUBSTR(num_cleansed, 1, 3),
        ') ',
        SUBSTR(num_cleansed, 5, 3),
        '-',
        SUBSTR(num_cleansed, 7)
    ) AS num_formatted
FROM
    (
        SELECT
            formatted_number,
            REPLACE(REPLACE(REPLACE(REPLACE(formatted_number, '(', ''), ')', ''), '-', ''), ' ','') as num_cleansed
        FROM
            EmpDetails
    ) FormattedPhoneNumbers
Enter fullscreen mode Exit fullscreen mode

In the above statement, the REPLACE function is used to remove the brackets, dashes, and whitespaces from the formatted_number field before undergoing the formatting process.
Screenshot of query results.

Extracting parts of a phone number

We can use the LEFT and RIGHT functions to specify and extract different parts of the phone number. In the following example, we will extract the area code of the phone number using the LEFT function and the last four digits using the RIGHT function based on the num_formatted field.

SELECT
    REPLACE(LEFT(formatted_number, 4), '(', '') As 'Area Code',
    RIGHT(formatted_number, 4) AS 'Last Four Digits'
FROM
    EmpDetails
Enter fullscreen mode Exit fullscreen mode

Here, the REPLACE function is used to remove the bracket from the selected number block. The result of the query is shown below.
Query result in Arctype.

Extracting phone numbers from a text block

One of the most common tasks in the world of data cleaning is extracting phone numbers from a text block. The complexity and feasibility of this task will mostly depend on the composition of the text.

Extracting a phone number with regular expressions

The easiest way to extract phone numbers is to utilize regular expressions to target the specific phone number formats. Extracting data has become far simpler with the introduction of functions like REGEXP_SUBSTR in MySQL 8.0. We will be populating the details column with some phone numbers in different formats, as shown below.
Arctype showing query results.

Identifying Rows with Matching Phone Numbers

First, we need to figure out which rows consist of data matching our requirements. In this case, the following regular expressions will be used on the details field.

  • Any consecutive 10 digits
  • Formatted number (XXX-XXX-XXXX)
SELECT
    *
FROM
    EmpDetails
WHERE
    # Any 10 Digits
    details REGEXP '[[:digit:]]{10}'
    # Formatted Number (XXX-XXX-XXXX)
    OR details REGEXP '[0-9]{3}-[0-9]{3}-[0-9]{4}';
Enter fullscreen mode Exit fullscreen mode

Results of the query can be seen below:
Regexp query results screenshot.

Extracting the phone numbers

Since we have identified the rows, the next step is to extract the phone numbers. It can be done using the REGEXP_SUBSTR function to extract the substring which matches the given regular expression. As we need to query two different regular expressions, we will be using the CONCAT_WS function to combine the results of both expressions into a single column.

SELECT
    emp_id,
    name,
    email,
    CONCAT_WS(
        '',
        REGEXP_SUBSTR(details, '^[0-9]+$', 1, 1, 'm'),
        REGEXP_SUBSTR(details, '[0-9]{3}-[0-9]{3}-[0-9]{4}', 1, 1, 'm')
    ) AS extracted_phone_numbers
FROM
    (
        SELECT
            *
        FROM
            EmpDetails
        WHERE
            details REGEXP '[[:digit:]]{10}'
            OR details REGEXP '[0-9]{3}-[0-9]{3}-[0-9]{4}'
) ExtractedDetails
Enter fullscreen mode Exit fullscreen mode

The result of this operation can be seen below:
Screenshot of phone number extraction.

Handling multiple phone numbers in a single field

To query results from a single field with multiple numbers, we need to create a stored procedure that loops through the desired field to capture all matching regex patterns. For instance, let's see how to extract multiple phone numbers from the details field of emp_id 1702 (Dan).

DELIMITER $$
CREATE PROCEDURE get_number_of_matches_full()
BEGIN
  DECLARE regex_match INT DEFAULT 1;
  DECLARE current_number INT DEFAULT 1;
  WHILE regex_match >= 1 DO
    CREATE TABLE IF NOT EXISTS extracted_numbers_table (
        `emp_id` int NOT NULL,
        `name` varchar(255) DEFAULT NULL,
        `email` varchar(255) DEFAULT NULL,
        `extracted_phone_num` varchar(255) DEFAULT NULL
    );
    INSERT INTO extracted_numbers_table (emp_id, name, email, extracted_phone_num)
    SELECT emp_id, name, email, REGEXP_SUBSTR(details, '[0-9]{3}-[0-9]{3}-[0-9]{4}', 1, current_number, 'm') FROM EmpDetails WHERE emp_id = 1702;
    SET current_number = current_number + 1;
    IF ((SELECT REGEXP_SUBSTR(details, '[0-9]{3}-[0-9]{3}-[0-9]{4}', 1, current_number, 'm') FROM EmpDetails WHERE emp_id = 1702) IS NULL) THEN
        SET regex_match = 0;
    END IF;
  END WHILE;
END $$

DELIMITER;
Enter fullscreen mode Exit fullscreen mode

The result of this operation is shown below.
Query result in Arctype.
In the above code block, we have created a stored procedure called get_number_of_matches_full, which loops through the details field until all the regex matches are found on the specified row of the EmpDetails table. We are using the REGEXP_SUBSTR function with the position argument to extract different matches. It updates by one at each loop run and saves the results on a newly created extracted_numbers_table. Finally, the loop exits when a NULL result is found.

We can call this procedure and view the results by querying the extracted_numbers_table as shown below.

CALL get_number_of_matches_full;

SELECT * FROM extracted_numbers_table;
Enter fullscreen mode Exit fullscreen mode

And, shown again in Arctype:
Query results in Arctype.

Conclusion

String operations in SQL are vital functions to clean and format data within a database. Moreover, string operations are the core component when dealing with valuable information such as phone numbers as they allow users to extract or manipulate phone numbers according to their requirements. However, it's important to remember that the exact functions available will change depending on the database type and version. Therefore, always remember to refer to the documentation to find out the available string operations and implement them to your heart's content.

Top comments (0)