loading...

The SoundEx Algorithm

lefebvre profile image Paul Lefebvre ・3 min read

Soundex is a phonetic algorithm for indexing names by sound, as pronounced in English. It is commonly used with databases to help with searching and is built-in to many database engines such as PostgreSQL and MySQL. SoundEx is not included with SQLite by default and there may be situations when you want to use it when searching.

Fortunately the algorithm is not all that difficult. You can read more about SoundEx on Wikipedia, but here are the general steps:

  1. Retain the first letter of the name and drop all other occurrences of a, e, i, o, u, y, h, w.
  2. Replace consonants with digits as follows (after the first letter):
    • b, f, p, v → 1
    • c, g, j, k, q, s, x, z → 2
    • d, t → 3
    • l → 4
    • m, n → 5
    • r → 6
  3. If two or more letters with the same number are adjacent in the original name (before step 1), only retain the first letter; also two letters with the same number separated by ‘h’ or ‘w’ are coded as a single number, whereas such letters separated by a vowel are coded twice. This rule also applies to the first letter.
  4. If you have too few letters in your word that you can’t assign three numbers, append with zeros until there are three numbers. If you have more than 3 letters, just retain the first 3 numbers.

In general, implementing this algorithm is pretty straightforward. This is an an example function for the SoundEx algorithm using Xojo, which should be pretty easy to translate to other languages as needed:

Private Function SoundEx(word As Text) As Text
  Const kLength As Integer = 4

  Dim value As Text

  Dim size As Integer = word.Length

  ' Make sure the word is at least two characters in length
  If (size > 1) Then
    word = word.Uppercase

    ' Convert the word to a character array for faster processing
    Dim chars() As Text = word.Split

    ' For storing the SoundEx character codes
    Dim code() As Text

    ' The current and previous character codes
    Dim prevCode As Integer = 0
    Dim currCode As Integer = 0

    ' Add the first character
    code.Append(chars(0))

    Dim loopLimit As Integer = size - 1
    ' Loop through all the characters and convert them to the proper character code
    For i As Integer = 0 To loopLimit
      Select Case chars(i)
      Case "H", "W"
        currCode = -1
      Case "A", "E", "I", "O", "U", "Y"
        currCode = 0
      Case "B", "F", "P", "V"
        currCode = 1
      Case "C", "G", "J", "K", "Q", "S", "X", "Z"
        currCode = 2
      Case "D", "T"
        currCode = 3
      Case "L"
        currCode = 4
      Case "M", "N"
        currCode = 5
      Case "R"
        currCode = 6
      End Select

      If i > 0 Then
        ' two letters With the same number separated by 'h' or 'w' are coded as a single number
        If currCode = -1 Then currCode = prevCode

        ' Check to see if the current code is the same as the last one
        If currCode <> prevCode Then
          ' Check to see if the current code is 0 (a vowel); do not proceed
          If currCode <> 0 Then
            code.Append(currCode.ToText)
          End If
        End If
      End If

      prevCode = currCode

      ' If the buffer size meets the length limit, then exit the loop
      If (code.Ubound = kLength - 1) Then
        Exit For
      End If
    Next

    ' Pad the code if required
    size = code.Ubound + 1
    For j As Integer = size To kLength - 1
      code.Append("0")
    Next

    ' Set the return value
    value = Text.Join(code, "")
  End If

  ' Return the computed soundex
  Return value
End Function

You call the SoundEx function like this:

Dim result As Text
result = SoundEx("Robert")   ' R163
result = SoundEx("Rupert")   ' R163 
result = SoundEx("Rubin")    ' R150 
result = SoundEx("Ashcraft") ' A261
result = SoundEx("Ashcroft") ' A261
result = SoundEx("Tymczak")  ' T522
result = SoundEx("Pfister")  ' P236

By saving the SoundEx results (in SQLite, JSON or wherever) you can use them again to compare with SoundEx results on other values for better searching.

Discussion

pic
Editor guide