Theofanis Despoudis

Posted on Jun 17, 2020

Solving Algorithm Challenges in Haskell: Anagrams

#haskell #learning #algorithms #functional

When I learn a new Programming language, I tend to spend a lot of time in the documentation.

However, a lot of times I find this approach very tiring and soon I lose interest.

I think there is a better way.

In this series of articles I'm going to write down the though process and the journey of learning Haskell in practice by solving common algorithmic challenges as listed in various places like Leetcode, HackerEarth or GeeksForGeeks.

The idea is to learn the basics of the language in a practical way. I will also provide a sample implementation in a different language like Typescript for comparison.

Let's start with a typical challenge you may find in programming interview sites:

Find minimum number of characters so that two strings become anagram

There are lots of variations on this problem as seen in the following list:

We are given two strings a and b and we are asked to find the number of additions or removals that we need to perform one the first string so it becomes an anagram of the other.

For example given the strings:

a = cde
b = abc

The operations to turn a into b are the following:

remove d
add b
remove e
add c

So the result would be 4. The character c exists in both strings so it does not count. So if we have two occurrences of the letter b in the first string and only one in the second then we only have one extra operation as we will only have to add one extra b.

This is the hint for the solution. We have to just calculate the count of characters of both strings and find the difference of their char counts.

For example in those two strings we have:

aCount = [(c, 1), (d, 1), (e, 1)]
bCount = [(a, 1), (b, 1), (c, 1)]

Then the minimum number of characters to become anagrams are:

c-c + (a - 0)+ (d - 0) + (e - 0) + (b - 0) = 1 - 1 + 1 + 1 + 1 + 1 = 4

Coding it in Haskell

The easy part is finding the algorithm here. The hardest part is porting it into Haskell.

In the beginning, I had a hard time finding the way to perform basic string and list manipulation operations.

You can't just start mapping and folding things that have different types. Every-time you will have type errors. You need to be aware each type of the current types that you are holding into.

Also you cannot just print an int using putStrLn. You need to convert into string first.

Let's start by building a small toolbox of functions:

First we read the number of test cases:

 -- Read num of cases
    n <- getLine

n is String here, not an Int. So in order to use it as an integer you need to convert it. Let's create a function that does that:

-- Converts int to string
strToInt :: String -> Int
strToInt x = read x :: Int

The notation here is for documentation and clarity. It simply says: I'm a function named strToInt that takes a String and returns an Int.

Then we need to repeat N times a function. Haskell has no for loop for that so we have to create our own using recursion:

-- Repeat function n times
repeatNTimes 0 _ = return ()
repeatNTimes n action =
 do
  action
  repeatNTimes (n-1) action

I already spent at least 30 min trying to figure out how to do a simple for loop like that!

TIP: As mentioned by Sam, we can use replicateM_ to achieve the same thing!

import Control.Monad
...
replicateM_ (strToInt n) anagramCheck

Now we have all the pieces for the main method:

main = do
    -- Read num of cases
    n <- getLine
    -- Repeat n times check
    repeatNTimes (strToInt n) anagramCheck

We only need to define the anagramCheck function which is the solution to the problem:

First we need to read the two strings:

first <- getLine
second <- getLine

Then we need to calculate their character frequencies. For simplicity we need a function that takes a string and returns a list of tuples (Char, Int) for each char in the alphabet
wordFrequencies :: String -> [(Char, Int)]

Using list comprehensions we can generate this like that:

-- Find word frequency of string
wordFrequencies :: String -> [(Char, Int)]
wordFrequencies word = [ (x,c) | x<-['A'..'z'], let c = (length.filter (==x)) word, c>=0 ]

We create a list of tuples (x,c) when x is the range of characters from 'A' to 'z' and c is the count of char occurrences within the vocabulary. Awesome!

So we get those char frequencies:

-- calculate frequency of each character in first string
   let count1 = wordFrequencies first

-- calculate frequency of each character in second string
   let count2 = wordFrequencies second

Then we need to calculate the sum of their absolute differences.

This is tricky because we need to iterate over two lists comparing indexes.

Do do that we need to iterate over the list of tuples, grab their second element, compute their abs difference and return the result. Then we take the sum.

To take the second item in a tuple:

snd count1

To map in a list:

(map snd count1)

To take the abs difference of two numbers:

absDiff :: Int -> Int -> Int
absDiff x y = abs (x - y)

To iterate over the first item on each list and combine them together:

zipWith (\x y -> absDiff x y) (map snd count1) (map snd count2)

To take the sum of the differences:

let diff :: Int = sum $ zipWith (\x y -> absDiff x y) (map snd count1) (map snd count2)

Finally we print the result:

putStrLn (show diff)

Here is the complete code of the solution:

{-# LANGUAGE ScopedTypeVariables #-}

strToInt :: String -> Int
strToInt x = read x :: Int

-- Find word frequency of string
wordFrequencies :: String -> [(Char, Int)]
wordFrequencies word = [ (x,c) | x<-['A'..'z'], let c = (length.filter (==x)) word, c>=0 ]

-- Find absolute diff of 2 numbers
absDiff :: Int -> Int -> Int
absDiff x y = abs (x - y)

-- Repeat function n times
repeatNTimes 0 _ = return ()
repeatNTimes n action =
 do
  action
  repeatNTimes (n-1) action

{-  
    function to calculate minimum numbers
    of characters to be removed to make
    two strings anagram
-}
anagramCheck = do
    first <- getLine
    second <- getLine

    -- calculate frequency of each character in first string
    let count1 = wordFrequencies first

    -- calculate frequency of each character in second string
    let count2 = wordFrequencies second

    -- calculate absolute difference of their freq count
    let diff :: Int = sum $ zipWith (\x y -> absDiff x y) (map snd count1) (map snd count2)
    putStrLn (show diff)
    return ()

main = do
    -- Read num of cases
    n <- getLine
    -- Repeat n times check
    repeatNTimes (strToInt n) anagramCheck

Coding it in Typescript

It's important so review the same code written in Typescript. Here is rough example:

function anagramCheck(word1: string, word2: string): number {
  let result = 0;
  const count1 = wordFrequency(word1);
  const count2 = wordFrequency(word2);
  for (let c in count1) {
    if (count2.hasOwnProperty(c)) {
      result += Math.abs(count1[c] - count2[c]);
    } else {
      result += 1;
    }
  }

  for (let c in count2) {
    if (count1.hasOwnProperty(c)) {
      result += Math.abs(count1[c] - count2[c]);
    } else {
      result += 1;
    }
  }

  return result;
}

function wordFrequency(word: string) {
  const freq = {};
  for (let i = 0; i < word.length; i++) {
    let c = word[i];
    if (freq.hasOwnProperty(c)) {
      freq[c]++;
    } else {
      freq[c] = 1;
    }
  }

  return freq;
}

This is more straightforward imperative code. It has it's merits but also it's downsides.

Next Steps

Stay put for more algorithmic challenges written in Haskell.

Top comments (6)

Brent Yorgey • Jun 18 '20

Nice work! Challenge: rewrite wordFrequencies to return a Map Char Int (using Data.Map.fromListWith), and then combine the maps.

Theofanis Despoudis • Jun 18 '20

TIL Data.Map.fromListWith

sam • Jun 18 '20

You could use Control.Monad.replicateM_ instead of your hand-rolled repeatNTimes function.

Theofanis Despoudis • Jun 18 '20

Thank you for the tip. I will update the article.

Aritra Dattagupta • Jun 25 '21

Why not just sort the two strings and check if they are equal?

Theofanis Despoudis • Jun 14 '22

That will also work but the complexity will be higher in general. O(nlogn) vs O(n) using this approach and you will be asked how you can improve it from the interviewer lol.