Algorithm Case Study: How to find anagrams!

#algorithm #casestudy #javascript #es6

I've learned some fairly interesting things about permutation and deduplication/pruning trees by doing this problem, so I figured I'd do a little writeup for y'all!

The challenge itself is as follows:

Given a string, return an array containing all of its possible anagrams
ex: anagrams("abc") returns ["abc", "acb", "bac", "cab", "cba"]
Bonus: remove duplicates from your array without using uniq

Let's get a sense of the scope of our problem, shall we?

What we have here is a problem of permutation. We've got a limited number of items (in this case, characters), and want to figure out every possible order in which we can arrange them. If we don't have duplicate characters, this will result in n! (n factorial) results, where n is the number of items that we're arranging. "abc" is a 3 character long string, so the results array should contain 6 items (3*2*1). We can use this to help check if our anagram generator works.

So, how do we actually start making permutations of our string?

I chose to visualize it as a tree.

This may look a little obscure, but the logic is based in the way that I would go about generating combinations by hand.

If we start with the string "abc", we can choose "a", "b", or "c" first.

If we choose "a", we have a choice between "b" and "c" remaining. If we choose "b", we have "c" left, or if we choose "c", we have "b" left. In the way I've drawn out the tree, you simply follow your choices down in order to get the final permutation. "a"->"b"->"c", giving you "abc" or "a"->"c"->"b", giving you "acb".

Traversing

So, we can use strategies similar to traversing a Trie in order to make sure that we hit every possible permutation. We'll use a recursive DFS traversal to do so.

We'll traverse down the tree until we hit a leaf (a node with no children), at which point we'll know that we've finished creating a permutation, based on the choices that we made to get to that point.

This is not enough to finish our function, but it's a lot of the core functionality, so we'll start with the traversal.

function traverse(string){
  for (let i = 0; i < string.length; i++){
    traverse(string.slice(0,i) + string.slice(i+1));
  }
}

Essentially, if we pick "a" first, we want to call traverse with the string "bc". In order to do that, we're using the native slice method to copy and concatenate everything besides the character at our current index, then we'll recursively call our traverse.

This alone isn't enough. There are still two things we need:

to keep track of the permutation that we're creating
to maintain an array of all permutations that we've finished

let's handle tracking our permutation. We'll simply add a second parameter that will start as an empty string. As we select each character, we'll concat that character to the end of the string for the next step of our traversal.

function traverse(string, perm = ''){
  for (let i = 0; i < string.length; i++){
    traverse(string.slice(0,i) + string.slice(i+1), perm + string[i]);
  }
}

There are some subtasks that come with maintaining and returning the array of outputs. We need to

create and return an output array
push to our output array when we reach a leaf node

We'll handle creating and returning our output array. Our output array won't be part of the recursion, so we'll put it in an outer function that will wrap our recursive traverse function.

function anagram(string){
  const output = [];
  function traverse(string, perm = ''){
    for (let i = 0; i < string.length; i++){
      traverse(string.slice(0,i) + string.slice(i+1), perm + string[i]);
    }
  }
  return output
}

Now, we have to maintain our array by pushing when our traversal hits a leaf node. Given that we're cutting down our string by a character at each step, we'll eventually reach a point where there aren't any more characters in string. It's at that point that we'll want to push to our output array.

function anagram(string){
  const output = [];
  function traverse(string, perm = ''){
    if (!string) output.push(perm)
    for (let i = 0; i < string.length; i++){
      traverse(string.slice(0,i) + string.slice(i+1), perm + string[i]);
    }
  }
  return output
}

now, we've got an array, we've got a way of implicitly generating and traversing a tree from a string, maintaining each possible permutation along the way, and an array to store it in and return. We simply need to invoke our traversal.

function anagram(string){
  const output = [];
  function traverse(string, perm = ''){
    if (!string) output.push(perm)
    for (let i = 0; i < string.length; i++){
      traverse(string.slice(0,i) + string.slice(i+1), perm + string[i]);
    }
  }
  traverse(string)
  return output
}

So, this works perfectly for strings that have no repeated characters, like "abc". But what's this about duplicates? Let's have a look.

If we traverse to "a", both of the subtrees are the same! we get "abb" and "abb". If we traverse to "b", we get "bab" and "bba", which are the same results if we traverse to the final character, another "b".

Our formula for calculating the number of permutations of unique items is n!. If we want to calculate permutations of a collection of items that includes repeated items, we simply take the factorial of each subset of repeated items, multiply them by each other, and divide n! by it. It looks something like this: n!/(a!*b!*c!....). In our case, we have 3 characters, and 2 characters are repeated. So, the expected number of combinations is 3!/2!, which is (3*2*1)/(2*1), or 6/2, leaving 3. "abb", "bab", "bba".

So, how do we get rid of duplicates? One way would be to put all of our permutations into an object, and use Object.keys() to return the unique keys. This works, but it's extra work, which takes up extra time and space.

We can do better.

Doing Better

The best thing we could do at this point is to actually notice when we're going to generate a repeated sub-tree from our string and avoid traversing that path.

How do we do this? The answer is in the description; we avoid repetition. Let's look at "abb". The "b"s are identical, aren't they? Choosing one "b" is the same as choosing the other, so if we've generated the sub-tree from one, we can completely ignore the other.

Let's use a Set to keep track of what characters we've already seen. If we've seen it before, we can skip traversing it since we've already done the work.

function anagram(string){
  const output = [];
  function traverse(string, perm = ''){
    const seen = new Set();
    if (!string) output.push(perm)
    for (let i = 0; i < string.length; i++){
      if (!seen.has(string[i])){
        seen.add(string[i]);
        traverse(string.slice(0,i) + string.slice(i+1), perm + string[i]);
      }
    }
  }
  traverse(string)
  return output
}

It's not too complicated, but there are a couple of details worth mentioning about this solution. We're putting our seen inside of traverse very intentionally. A "b" at our top level node is different than a "b" one or two levels deep, so keeping distinct sets for each stack frame is vital.

The other is that we're nearly doubling our space complexity at worst case (no repetition), but as soon as we have even two of the same character, we're reducing our work by nearly half, as well as saving space on opening new stack frames with our recursion. We could optimize this by doing a preliminary scan of our string to check for duplicates before sending it off to a traversal that either does or doesn't maintain the deduplicating set.

slice also isn't an ideal operation, and we could simply pass down i to the next invocation of traverse and skip that character.

Thanks for following along!