DEV Community

Cover image for Ways to parse postal address from string
Geoapify
Geoapify

Posted on

Ways to parse postal address from string

Almost every large web project includes parts that work with customer addresses or postal addresses. So often, the products need to work with addresses entered by customers that the system should parse and standardize somehow.

In most cases, locations are presented as a string that should be divided into smaller components (country, locality, postal code, house number, etc.). Parsers make them look similar, normalize the look, and check the clarity.

There are different ways of postal address normalization. We want to describe how to do it in different ways and the pros & cons of those algorithms.

RegEx address parser

If you don’t need to normalize addresses and they are regular or have the same form and syntax, the Regex address parser will fit your purposes. It is the most accessible instrument, where you set the regular expression that will be applied then.

For example, if a string looks like “POSTCODE-CITY, STREET_NAME HOUSE_NUMBER”, an expression will divide it into components “POSTCODE”, “CITY”, “STREET_NAME”, “HOUSE_NUMBER”. Here is an example of an regular expression that can parse such addresses:

const address = '45000-Ufa, Mendeleev Str 100';
const { groups: { postcode, city, street, housenr } } = 
    /(?<postcode>\d{5})-(?<city>\w+),\s(?<street>.+)\s(?<housenr>\d+)/ug.exec(address);
Enter fullscreen mode Exit fullscreen mode

To test RegEx and build an expression, you can use, for example, this online instrument Regexr.

Pros

  • Simple to apply, easily separated from the code
  • Highly configurable and flexible
  • Works without external libraries and parsing APIs

Cons

  • Difficult to debug and change in the process
  • Not performant and hard to read
  • Can be applied with standardized strings only

NPM-packages

If you need to work with particular country addresses, NPM-packages will fit your purposes. There are numerous libraries with different characteristics. Some of them work with certain countries, and others support special formats.

One of the most popular libraries is parse-address, which sorts and standardizes US postcodes. In addition, there are frequently used libraries such as addresser for property addresses or address-parse for Chinese postcodes.

Before choosing one library, try picking different ones to find the most suitable one.

Pros

  • Works without external services and APIs
  • Open source
  • Convenient to add

Cons

  • You should find a library that fully fits your purposes
  • When using NPM-packages for a commercial project, check licenses precisely not to find problems

Geocoding API to parse, normalize and verify addresses

When you work with dozens of locations for big projects, a Geocoding API becomes a must. It parses, standardizes, and checks the correctness of addresses.

Many geocoding services on the market differ by price, term conditions, and quality of results. However, there are two big groups - geocoders based on proprietary data and based on open data.

The first ones provide more proof and precise results but have strict rules regarding data storage. On the other hand, the open-data-based geocoders are more permissive, so you can parse and validate addresses and store them on your side.

For example, Geoapify Geocoding API is based on open data sources like OpenStreetMap, OpenAddresses, GeoNames, etc. It returns a parsed address and the corresponding location. Here is an example of result object for the "36 Glasshouse St, London W1B 5DL, United Kingdom":

{"type":"FeatureCollection","features":
[{"type":"Feature","geometry":{"type":"Point","coordinates":[-0.1370947,51.5104927]},
"properties":{
  "housenumber":"36",
  "street":"Glasshouse Street",
  "country":"United Kingdom",
  "datasource"{"sourcename":"mixed"},
  "country_code":"gb",
  "state":"Greater London",
  "city":"London",
  "district":"Westminster",
  "suburb":"Soho",
  "lon":-0.1370947,
  "lat":51.5104927,
  "postcode":"W1B 5DR",
  "formatted":"36 Glasshouse Street, London W1B 5DR, United Kingdom",
  "address_line1":"36 Glasshouse Street",
  "address_line2":"London W1B 5DR, United Kingdom",
  "result_type":"building",
  "rank":{
    "popularity":8.988490181891963,
    "confidence":1,
    "confidence_city_level":1,
    "confidence_street_level":1,
    "match_type":"full_match"
  },
  "place_id":"51303b7ab2518cc1bf5962ac25d357c14940c00203"}
}],
"query":{
  "text":"36 Glasshouse St, London W1B 5DL, United Kingdom",
  "parsed":{
    "housenumber":"36",
    "street":"glasshouse st",
    "postcode":"w1b 5dl",
    "city":"london",
    "country":"united kingdom",
    "expected_type":"building"
  }
}}
Enter fullscreen mode Exit fullscreen mode

Moreover, Geoapify Geocoding API adds trustworthiness to the data. It not only deals with parsing but returns the most suitable location for each string entered.
Every address gets validated and receives confidence rank values on three levels. Each rank may have values from 0 to 1, where "1" is 100% confident:

  • confidence corresponds to the complete address. When confidence = 1, that means that address is found and verified;
  • confidence_street_level corresponds to the street level of the address. When confidence_street_level = 1, that means that at least up to the street level the address verified;
  • confidence_city_level corresponds to city level.

With such advanced functions, you can parse addresses most effectively without mistakes.

Pros

  • Allows forward and reverse geocoding (location by address and address by location)
  • Supports different countries and languages
  • Cross-platform and changeable
  • Has a Free Tier

Cons

  • Only paid version if you operate with a large number of addresses
  • Requires effort to deal with addresses that weren’t verified

Which one is the best?

We named ways of operating with addresses from the easiest to the most advanced instruments, and each one fits different purposes. Geoapify Geocoding API is probably the most versatile one. Use it to standardize postal addresses, find locations, and operate with them. If all strings are regular, choose Regex. For other purposes, try NPM-packages for address parsing.

We hope you have found the best fitting way of parsing addresses from our article. Keep reading to learn more tips from geocoders!

Discussion (0)