Zara Ziad

Posted on Feb 23, 2022

A codeless solution for cleaning and verifying your address data

#data #dataquality #addressverification

Today, data has become one of the greatest assets of an organization. Whether you want to design customer journeys or forecast business future, data is the main ingredient that helps to attain successful outcomes. This is why business owners invest in developing custom solutions for keeping their data clean – especially a customer or contacts database.

But since multiple employees at a company work with, manipulate, and use the contacts dataset, it is soon filled with inconsistencies and inaccuracies. And then the company’s IT staff is expected to build an in-house solution that magically gets rid of all errors present in the database.

Coding every solution from scratch

Although it is possible to write code for cleaning and standardizing datasets, but it is definitely an inefficient solution – considering the number of resources (time, people, and money) required for its implementation. And after factoring in the cost of annual maintenance and upgrades, it is 2-3 times more expensive than adopting existing solutions.

This reminds me of something one of my coder friends told me recently: At some point in every developer’s life, they realize how unproductive it is to code every solution by hand. Sometimes it is more efficient to adopt existing solutions available in the market – open-source libraries or commercial products – rather than coding solutions from scratch.

In this blog, I will explain some common terminologies and steps involved in cleaning and validating addresses present in a customer’s database. This will definitely help you to understand what to look for while choosing an existing solution available in the market. Let's get started.

Common terminologies involved

Before we get into specifics about the process, there are some common terminologies used in this domain, let’s first go over them and see what they mean.

Address standardization

Address standardization (also known as address normalization) means updating the format of an address according to an authoritative standard (such as the USPS addressing standard in US). This process makes sure that the addresses are present in an acceptable format – includes correct spelling, abbreviations, geocodes, as well as is appended with ZIP+4 values.

Address verification

Address verification (also known as address validation) is the process of running the standardized addresses against an authoritative database (such as the USPS in US), and making sure that these addresses are real – meaning, they are mailable and valid locations within the country for mail delivery.

Difference between the two

Sometimes both these terms are used interchangeably, but there’s a difference between the two. Addresses should be first standardized to follow an acceptable format. Once standardized, they are now ready to be verified to check if these addresses are real and valid.

Process of standardizing and validating addresses

The following steps are involved in this process:

1. Profiling addresses

Before any activity can be performed on the address database, it is important to assess its current state. This is where address profiling can be very helpful. It identifies the records that contain incomplete or missing address information, as well as the ones that don’t follow a standardized pattern.

Address profiling highlights potential cleansing and standardization opportunities present in your dataset. Furthermore, this profile report is usually generated again at the end of the process so that both the initial and ending reports are compared to see if there are still errors present in the dataset.

2. Parsing addresses

The address standardization starts by parsing every address into its sub-components. This is important since addresses are mostly stored as a single field in a dataset. And running validation checks on the entire field is not as accurate as running them on its sub-pieces. For this reason, a single address is usually parsed into street number, street name, zip code, postal code, directions, city, state, and county.

3. Geocoding

In this step, the latitudinal and longitudinal geocodes are computed for all addresses. In addition to that, depending on the computed geocodes, you can also find out the 5-digit zip codes and 4-digit routes of delivery area.

4. Reconstructing addresses

Once all this information is computed and standardized, it is not time to reformat and reconstruct the addresses in the required format. This can be done and then saved in the database, or if needed, it can be computed in real-time whenever and however needed.

An example of such formatting is the USPS addressing standard that requires the delivery address to cover three lines – the first one contains the recipient’s name, the second one contains the street address, and the third one contains the city, state, and zip code.

5. Verifying addresses

When an address has all the necessary components, you can now verify its validity against any authoritative database to find out whether the address is an actual, mailable location. In addition to verification, such databases can also tell the type of address – residential or business – as well as some other secondary details.

Conclusion

And there you have it, a 5-step, codeless process for cleaning and verifying your address data. Implementing such a solution from scratch can be very challenging and can take years to improve the result accuracy.

There are many address verification tools in the industry today, including some that are CASS-certified – a certification title that the USPS assigns to software vendors offering accurate address standardization and verification services.

Such tools can definitely improve your team’s operational efficiency and enable them to design exceptional experiences for customers by using correct and accurate location information.

DEV Community