Solving Anagrams Programmatically: Hash Maps, Sorting, and Combinatorial Explosions

#javascript #beginners #programming #tutorial

"Astronomer" is an anagram of "moon starer." "Listen" rearranges to "silent." "Dormitory" becomes "dirty room." These feel like they must be intentional, but they're statistical inevitabilities -- with 26 letters and enough words in the English language, meaningful anagram pairs emerge by sheer combinatorial volume.

But here's the interesting question: how do you find them algorithmically? Given an arbitrary string, how do you efficiently find all valid English words that can be formed from its letters? This is a classic computer science problem that touches on hashing, sorting, tries, and combinatorial complexity.

The brute force approach (and why it fails)

The most obvious approach: generate all permutations of the input letters and check each one against a dictionary.

function* permutations(arr) {
  if (arr.length <= 1) { yield arr; return; }
  for (let i = 0; i < arr.length; i++) {
    const rest = [...arr.slice(0, i), ...arr.slice(i + 1)];
    for (const perm of permutations(rest)) {
      yield [arr[i], ...perm];
    }
  }
}

The problem is combinatorial. A 7-letter word has 7! = 5,040 permutations. A 10-letter word has 3,628,800. A 15-letter word has over 1.3 trillion. The number of permutations grows factorially, which is worse than exponential. For any input longer than about 10 characters, brute force permutation is computationally intractable.

And this only finds full anagrams (rearrangements using all letters). If you want partial anagrams (words formed from a subset of the letters), the search space explodes further.

The sorting trick

Here's the elegant insight: two words are anagrams of each other if and only if they contain the same letters in the same quantities. And the simplest way to normalize letters into a canonical form is to sort them alphabetically.

function sortKey(word) {
  return word.toLowerCase().split('').sort().join('');
}

sortKey('listen');  // 'eilnst'
sortKey('silent');  // 'eilnst'
sortKey('enlist');  // 'eilnst'

All three words produce the same sorted key. This means you can preprocess your entire dictionary by computing the sort key for every word and grouping words by their keys.

// Build the anagram index
const anagramIndex = new Map();

for (const word of dictionary) {
  const key = sortKey(word);
  if (!anagramIndex.has(key)) {
    anagramIndex.set(key, []);
  }
  anagramIndex.get(key).push(word);
}

// Lookup is now O(k log k) where k is the word length
function findAnagrams(input) {
  const key = sortKey(input);
  return anagramIndex.get(key) || [];
}

The preprocessing is O(n * k log k) where n is the dictionary size and k is the average word length. After that, each lookup is O(k log k) -- essentially instant. This is the approach used by most practical anagram solvers.

The hash approach: even faster lookups

Sorting each word takes O(k log k). You can reduce this to O(k) by using a frequency count instead:

function freqKey(word) {
  const freq = new Array(26).fill(0);
  for (const ch of word.toLowerCase()) {
    freq[ch.charCodeAt(0) - 97]++;
  }
  return freq.join(',');
}

freqKey('listen');  // '0,0,0,0,1,0,0,0,1,0,0,1,0,1,0,0,0,0,1,1,0,0,0,0,0,0'
freqKey('silent');  // '0,0,0,0,1,0,0,0,1,0,0,1,0,1,0,0,0,0,1,1,0,0,0,0,0,0'

The frequency vector uniquely identifies an anagram class. Two words with the same frequency vector are anagrams. The key generation is O(k) instead of O(k log k), though for typical word lengths the difference is negligible.

An even more clever approach uses prime numbers. Assign each letter a unique prime number (a=2, b=3, c=5, d=7, ...) and compute the product of the primes for each word. By the fundamental theorem of arithmetic, two words have the same prime product if and only if they have the same letter frequencies.

const primes = [2,3,5,7,11,13,17,19,23,29,31,37,41,
                43,47,53,59,61,67,71,73,79,83,89,97,101];

function primeKey(word) {
  return word.toLowerCase().split('').reduce(
    (product, ch) => product * primes[ch.charCodeAt(0) - 97], 1
  );
}

This is elegant but has a practical problem: the products get very large for long words, exceeding JavaScript's safe integer range. BigInt solves this but introduces performance overhead. For real implementations, the sort or frequency approach is more practical.

Multi-word anagrams: the hard problem

Finding that "astronomer" rearranges to "moon starer" (two words) is significantly harder than finding single-word anagrams. You need to find combinations of words from the dictionary whose combined letter frequencies match the input.

This is a subset-sum variant and it's NP-complete in the general case. Practical approaches use pruning:

Build a dictionary filtered to only words whose letters are a subset of the input's letters.
Recursively try each word, subtract its letters from the remaining pool, and recurse with the reduced pool.
Prune branches where the remaining letters can't form any dictionary word.

function multiWordAnagrams(letters, dictionary, current = []) {
  if (letters.length === 0) return [current];

  const results = [];
  for (const word of dictionary) {
    if (isSubset(word, letters)) {
      const remaining = subtractLetters(letters, word);
      results.push(
        ...multiWordAnagrams(remaining, dictionary, [...current, word])
      );
    }
  }
  return results;
}

Even with pruning, multi-word anagram solving for long inputs can be slow. A 15-letter phrase might have thousands of valid multi-word anagram combinations.

Anagrams in the real world

Beyond word games, anagram detection has practical applications:

Plagiarism detection sometimes uses anagram-like fingerprinting to detect rearranged text.

Cryptography history includes ciphers based on letter transposition, which are essentially anagram encryption.

Bioinformatics uses similar algorithms for comparing DNA sequences, where the "alphabet" is {A, T, G, C} and researchers look for sequences with identical nucleotide compositions.

Data deduplication uses normalized representations (like our sort key) to identify records that contain the same data in different orders.

Tips for word game players

1. Look for common prefixes and suffixes. If your letters contain I-N-G or E-D or T-I-O-N, try building words around those patterns first.

2. Count vowels and consonants. A set of 7 letters with 5 consonants and 2 vowels will form different words than one with 4 vowels and 3 consonants. The ratio constrains your options.

3. Identify high-value rare combinations. Q almost always needs U. X often pairs with E. J and Z are consonants that severely limit word options.

For solving anagrams instantly -- whether for word games, puzzles, or just satisfying curiosity about what words hide inside other words -- I built an anagram solver at zovo.one/free-tools/anagram-solver that uses the indexed approach described above for near-instant lookups.

There's something satisfying about the intersection of wordplay and algorithms. Anagrams feel like a purely creative, linguistic challenge. But the fastest anagram solvers are built on hash maps, prime factorization, and combinatorial search -- computer science making the creative problem tractable.

I'm Michael Lip. I build free developer tools at zovo.one. 350+ tools, all private, all free.