# An Algorithm for Finding Anagrams

###
Brandon
*
Originally published at
skerritt.blog
on
*
ă»3 min read

Finding anagrams of words does not look like a difficult problem but has an interesting solution.

An anagram is a word or sentence that can be transformed into another word or sentence. Elvis has all the same letters as Lives, so Elvis is an anagram of Lives.

The way most people would immediately solve this problem would be to take a word, go through every word in the dictionary and see if the combinations of letters match exactly.

The way to do this will use a *multiset*. *Sets* are like arrays where the order doesnât matter and repetitions arenât allowed. With an array, [a, b] is not the same as [b, a]. But with a set, (a, b) is the same as (b, a).

Setâs donât allow repetitions. So (a, a, a, a, a, b) is the same as (b, a)âââbecause the first set would be turned into (a, b).

A *multiset* is a set which allows repetitions but the order doesnât matter. For this example, letâs start small.

With our example, we have a list called a dictionary where each item in that dictionary is a word. We want to find out which of these words are anagrams of âElvisâ, so what we do is for loop through the dictionary like so:

Youâre right. This is really slow. It would work, but we have 2 problems. The first problem is that capitalisation wonât make the multisets equal. The second problem is that more spaces can appear in the outputted sentence than in the original word.

As an example, âroast beefâ is an anagram of âeat for BSEâ.

Given 2 sentences such as âroast beefâ and âeat for BSEâ, if we turned these into multisets they wouldnât be equal due to the differing amount of spaces. Thereâs another way we can calculate if 2 sentences are anagrams of each other. The fundamental theorem of arithmetic states:

Every integer either is a prime number itself or can be represented as the product of prime numbers and that, moreover, this representation is unique, the order of the factors.

Prime numbers are numbers which only have 2 factorsâââthe number itself and 1.

If we assign each letter in the alphabet to a prime number, like so:

And so on, then compute the product of these numbers this number is uniqueâââbecause of the fundamental theorem of arithmetic.

That means that for a multiset of letters, the product of prime numbers for each letter in that multiset is unique. If two words or sentences have the same number, these two words or sentences are anagrams of each other. Letâs see a quick example, is âBACâ an anagram of âA Bcâ?

Determining which words are anagrams of other words would be as simple as creating a dictionary of {Word: Prime factorisation} and then grouping all the prime factorisation up.

Now, given 2 sentences we can easily tell if they are anagrams of each other.

## We can go faster

This prime factorisation trick is interesting, but it doesn't scale very well. Prime numbers grow exponentially, so the longer the message the larger the numbers - meaning the slower it becomes.

A much quicker way to check for anagrams is to sort the message, ascending from A to Z while removing all punctuation / spaces. There are many ways to sort data, for example Timsort is a sorting algorithm.

Once you've sorted the two sentences / words you want to compare, if each list is equal to eachother then they are anagrams.