Search billions of records on Ancestry.com
   

 

HOMEPEOPLEDOCUMENTSPHOTOSSOURCESTECH TIPSLINKSGUESTBOOK


Soundex Demystified

Ever wonder what's up with Soundex, and why it's so useful for genealogy? It was created to use when researching difficult surnames, or surnames that been spelled or pronounced diferently over the decades. If after reading this information you're still having difficulty understanding the concept, I have provided a link to a surname to Soundex converter. First, the official definition:

Soundex is a phonetic algorithm for indexing names by sound, as pronounced in English. The goal is for homophones to be encoded to the same representation so that they can be matched despite minor differences in spelling. The algorithm mainly encodes consonants; a vowel will not be encoded unless it is the first letter. Soundex is the most widely known of all phonetic algorithms, as it is a standard feature of MS SQL and Oracle, and is often used (incorrectly) as a synonym for "phonetic algorithm". Improvements to Soundex are the basis for many modern phonetic algorithms.

Soundex was developed by Robert C. Russell and Margaret K. Odell and patented in 1918 and 1922. A variation called American Soundex was used in the 1930s for a retrospective analysis of the US censuses from 1890 through 1920. The Soundex code came to prominence in the 1960s when it was the subject of several articles in the Communications and Journal of the Association for Computing Machinery, and especially when described in Donald Knuth's The Art of Computer Programming, vol. 3: Sorting And Searching, Addison-Wesley Professional (1973), p. 391-392.

The Soundex code for a name consists of a letter followed by three numerical digits: the letter is the first letter of the name, and the digits encode the remaining consonants. Similar sounding consonants share the same digit so, for example, the labial consonants B, F, P, and V are each encoded as the number 1. Vowels can affect the coding, but are not coded themselves except as the first letter. However if "h" or "w" separate two consonants that have the same soundex code, the consonant to the right of the vowel is not coded.


Soundex Coding Guide

Number
 Represents the Letters
1
 B, F, P, V
2
 C, G, J, K, Q, S, X, Z
3
 D, T
4
 L
5
 M, N
6
 R

Disregard the letters A, E, I, O, U, H, W, and Y.


The correct Soundex Value can be found as follows:

1. If "h", "w" separate two consonants with the same soundex code, change consonants to right of the vowel into "h" until they have the same soundex code
2. Replace consonants with digits but do not change the first letter (see above coding guide).
3. Collapse adjacent identical digits into a single digit of that value.
4. Remove all non-digits after the first letter.
5. Return the starting letter and the first three remaining digits. If needed, append zeroes to make it a letter and three digits.

Using this algorithm, both "Robert" and "Rupert" return the same string "R163" while "Rubin" yields "R150". "Ashcraft" yields "A261".


Sources:

Surname to Soundex Converter, on the Ancestor Search website:
http://www.searchforancestors.com/soundex.html

Eastman's Online Genealogy Newsletter: Soundex Explained
http://blog.eogn.com/eastmans_online_genealogy/2010/08/soundex-explained.html

National Archives and Records Administration (NARA): The Soundex Indexing System
http://www.archives.gov/publications/general-info-leaflets/55.html

Wikipedia: Soundex
http://en.wikipedia.org/wiki/Soundex

 

back to Tech Info page