DEV Community

Cover image for Write a script to identify an anagram
Peter Kim Frank
Peter Kim Frank

Posted on

Write a script to identify an anagram

Inspired by this tweet from the always awesome Fermat's Library.

Challenge

In the language of your choice, write a script that will determine if two strings are an anagram for each other. You can use the approach outlined in the tweet, or — for bonus points — come up with your own solution 🤓.

Via Wikipedia:

An anagram is word or phrase formed by rearranging the letters of a different word or phrase, typically using all the original letters exactly once.

For our challenge, we'll use the strict definition — each letter must be used exactly once.

Expected Behavior:

isAnagram(stressed, desserts) // TRUE
isAnagram(happy, sad) // FALSE
Enter fullscreen mode Exit fullscreen mode

Have fun!

Latest comments (38)

Collapse
 
lyon profile image
Lyon

I feel we need SPOILER warnings for the comments

Collapse
 
scimon profile image
Simon Proctor

So there was a perl6 version of the using this method posted on twitter. But it had a wee bug (it was summing not multiplying the value). Here's the fixed version:

sub ana(*@words) {
    state %primes = 'A'..'Z' Z=> (^Inf).grep(*.is-prime);
    [==] @words.map({ [*] %primes{$_.comb} });
}

I realise any version of Perl can look like line noise so I'll break it done.

Firstly the *@words indicates that all the arguments passed into the function should be put into an Array called @words.

state %primes state variables are defined once and then reused each time the function is called so the first time you call the ana function the primes hash is populated.

(^Inf).grep(*.is-prime) this gives a lazy list of integers from 0 to Infinity (but then only returns those that are prime).

'A'..'Z' gives the Range of characters from A to Z (inclusive).

The Z meta operator applies the infix operator on it's right (in this case the => Pair constructor) to each element in the Positional objeect on it's left and right until one is used up. (That important, don't use a Z operator on a couple of lazy lists... it won't go well).

So that gives us a hash starting { A => 2, B => 3, C => 5 ... and so on.

@words.map applies the code block inside the map call to each item in @words. Inside the block $_ is assign the current value.

$_.comb splits the word into individual characters and %primes{$_.comb} references each of the characters and gets it's value in the prime array.

[*] and [==] are cases of applying the [] meta operator to an infix operator. It acts like taking the list on the right and putting the operator between each item.

So for the string "ABC" %primes{$_.comb} returns the list (2 3 5) and then you have [*] (2 3 5) which can be thought of as 2 * 3 * 5 or 30.

The [==] takes our list of products and puts == between them. In this case it makes use of Perl6 operator chaining which allow you to write something like 30 == 30 == 30 or 5 < 10 > 7 in both the cases each individual Boolean check is made and then these are all combined into a sets of AND checks.

Hope that makes sense. Note that this function only works on uppercase strings. It would be easy enough to make it case insensitive and to also add any Unicode characters for accents found in the expected strings.

Collapse
 
vinaypai profile image
Vinay Pai • Edited

This algorithm is actually an excellent illustration of why clever algorithms aren't always better.

In fact the clever algorithm here is worse in basically every way imaginable. It's more complicated, limited in scope, error prone, can fail in unexpected ways, and possibly less efficient than the obvious solution.

  • You need to store a lookup table of letters to primes. For the 26 letters, this 26 element lookup table is already more code than it would take to iterate over the characters and increment values in a map.
  • If you typo one of the values you'll get wrong answers all the time. Or you need code to compute the first n primes which comes with its own performance penalty and extra code.
  • The list of primes gets pretty huge if you want to support any kind if i18n and you're dealing with more than just 26 letters.
  • In most languages on modern processors integers are 64 bits long. The 26th prime is 101. In the worst case you could overflow the integers in as few as 10 letters. Integer overflow is DEFINITELY an issue if you're dealing with i18n and therefore more characters and larger primes, or if you want to support phrases rather than individual words, and therefore longer input. Once you overflow and wrap around you could get false positives. Or you could use arbitrary precision arithmetic, but that's drastically less efficient.
Collapse
 
danidee10 profile image
Osaetin Daniel

I'm surprised that no one has talked about using ord in python (It Returns the Unicode code point for a string) and comparing the sum

>>> sum(map(ord, 'stressed')) == sum(map(ord, 'desserts'))
True

It even performs better than sorting

>>> timeit.timeit("sorted('stressed') == sorted('desserts')")
1.3272779149992857
>>> timeit.timeit("sum(map(ord, 'stressed')) == sum(map(ord, 'desserts'))")
1.2404647049988853
Collapse
 
vinaypai profile image
Vinay Pai

It performs better but it's wrong. Comparing sums can give you false positives.

>>> sum(map(ord, 'false positive'))
1438
>>> sum(map(ord, 'farce positivo'))
1438
>>>
Collapse
 
jay profile image
Jay

Rust way, could be done better.
This just compares the char vector directly after sorting them.

fn is_anagram(a: &str ,b : &str) ->  bool {
    if a.len() != b.len() {    // Length check 
        return false;
    }
    let mut a:Vec<char> = a.chars().collect();
    a.sort();
    let mut b:Vec<char> = b.chars().collect();
    b.sort();
    if a == b {
        return true
    }
    false
}
Collapse
 
jselbie profile image
John Selbie • Edited

C++

using namespace std;
bool isAnagram(const string& s1, const string& s2)
{
    unordered_map<string::value_type, int> themap;
    if (s1.size() != s2.size())
        return false;
    for (auto c : s1)
        themap[c] += 1;
    for (auto c : s2)
        themap[c] -= 1;
    for (auto& p : themap) {
        if (p.second != 0)
            return false;
    }
    return true;
}
Collapse
 
heikodudzus profile image
Heiko Dudzus • Edited

I like the solution to "subtract" all characters of the shorter word from the (possibly) longer word, and check if there is some character left:

import Data.List(delete,sortBy)
import Data.Function(on)

-- deletes the first character of word from str
del str word = delete (head word) str

-- deletes all chars of word from str
str `without` word = foldl del str (map (:[]) word)

-- are there any remaining characters? 
isAnagram word1 word2 = null $ longer `without` shorter
  where [shorter,longer] = sortBy (compare `on` length) [word1,word2]

Ok, this is more fun, but it somehow reimplemented the library function ( \ \ ) (which can be read as the difference operation in set theory)

import Data.List((\\),sortBy)
import Data.Function(on)
isAnagram word1 word2 = null $ longer \\ shorter
  where [shorter,longer] = sortBy (compare `on` length) [word1,word2]
Collapse
 
heikodudzus profile image
Heiko Dudzus • Edited

I'm sorry, it took me a dinner out to see, this was unnecessarily complicated. :) The delete functions perfectly fits a right fold. And if the words aren't of equal length, they are not anagrams.

-- deletes the first occurance of element x from a list
delete _ [] = []
delete x (y:ys) = if x == y then ys else y : delete x ys

-- deletes all elements of the second list from the first
without :: Eq a => [a] -> [a] -> [a]
without = foldr delete

-- are there any remaining characters? 
isAnagram word1 word2 = null $ word1 `without` word2

delete could also be imported from Data.List

Collapse
 
heikodudzus profile image
Heiko Dudzus

This is close to cheating and must have terrible space and runtime complexity:

import Data.List(permutations)
isAnagram word1 word2 = word1 `elem` permutations word2

The solution based on sort:

import Data.List(sort)
isAnagram word1 word2 = sort word1 == sort word2

The library functions sort and permutations seem to be such an integral part of the solution, they should be implemented, here. :)

Collapse
 
heikodudzus profile image
Heiko Dudzus • Edited

I stick with mappings to prime numbers. Haskell, without considerations of performance:

import Data.Char(toLower)
import Data.List(lookup)

primes = sieve [2..]
  where sieve (p:xs) = p : sieve [x | x <- xs, x `mod` p /= 0]

charToPrime c = lookup c $ zip ['a'..'z'] primes

eval word = product <$> sequence (map (charToPrime . toLower) word)

isAnagram word1 word2 = eval word1 == eval word2

lookup and charToPrime return a Maybe value, without catching the eye. Umlauts or digits result in a Nothing value for the character and in a product of 1 for the word. The handling of those cases could be improved.

Collapse
 
joshavg profile image
Josha von Gizycki

Always happy to provide Clojure answers ;)

(defn anagram? [w1 w2]
  (= (sort w1) (sort w2)))

And of course I snooped the algorithm from all the other answers here.

Collapse
 
smartyalgo profile image
Michael Ho • Edited

You don't need sort or multiply or any fancy stuff

int[26] wordcount; //array containing encounter count of each character.

for each char:
    charnum <- ConvertCharToNum(char)
    wordcount[charnum]++

return wordcount

Check if the wordcount array of the two strings.

Computation
Does not require to sort, which is nlogn. This only requires a single pass to create wordcount array, thus n.
You can also optimize on aborting upon first index mismatch with this solution (You could also do this with the prime hashing, by aborting on first division yielding a decimal number).

Memory
The 26th prime is 97. Assume it's 100, 6 'z's would be 1006 = 1012. It would take 39 bits, or 5 bytes to store (1012 /log2). This is much less efficient to store a hash. If you want to be stricter, you could store only 1 wordcount, and have the second string count down instead of up.

Feature extension
If I want to accept numbers, the 36th prime is 367 (See Memory). With the array, it's just allocating another 10 index. It is also possible to add this feature without redeploying code, by replacing a specifically allocated array with a map of char to count

Collapse
 
heikodudzus profile image
Heiko Dudzus • Edited

This is good. I adapted it to Haskell, recording letter frequencies in Maps/dictionaries. The Map data structure is already an instance of the Eq typeclass, so it was a one-liner to add the equality check:

import qualified Data.Map.Strict as M

-- incrementor for Map entries (used by Map's alter function)

inc Nothing = Just 1
inc (Just x) = Just (x+1)

-- produces a Map containing frequencies of elements in a given list (or Foldable)
-- e.g. a Map of frequencies of characters in a given word

frequencies :: (Foldable t, Ord k, Num a) => t k -> M.Map k a
frequencies = foldr (M.alter inc) M.empty

-- equality is already defined for Maps :)

isAnagram word1 word2 = frequencies word1 == frequencies word2
Collapse
 
misterysin profile image
MTD

Best answer here. The prime trick is nice but as you said it gets way too expensive very fast.

Collapse
 
kspeakman profile image
Kasey Speakman • Edited

Edit: Added sort-based solution for Elm.

import String
import List

sortLetters word =
  List.sort (String.toList word)

isAnagram word1 word2 =
  sortLetters word1 == sortLetters word2

I think the primes thing was actually harder than what I did. Here is my 15-line Elm solution where you could also reuse the breakdown of how many of which letters were used. I could have turned it into 1 long line with the 3 import statements, but I won't do that to you. :)

Read this code from the bottom up. I got used to writing it like this in F# because of its single-pass compiler. This is not a requirement of Elm, though.

import Dict exposing (Dict)
import String
import Char


updateCount countM =
  case countM of
    -- no previous count of letter, set to 1
    Nothing ->
      Just 1

    -- saw same letter again, increment
    Just i ->
      Just (i + 1)

updateAnalysis char dict =
  Dict.update (Char.toLower char) updateCount dict

analyzeWord word =
  String.foldr updateAnalysis Dict.empty word

isAnagram word1 word2 =
  -- built-in equality superpowers
  analyzeWord word1 == analyzeWord word2

Don't take my word for it.

Here is a full program with tests that you can run over at elm-lang.org/try. Paste the code below in the left window and hit the Compile button.

I included type annotations for clarity, but they are not necessary.

import Html exposing (Html, text, div, span, u)
import Html.Attributes exposing (style)
import Dict exposing (Dict)
import String
import Char


-- anagram functions

type alias WordAnalysis = Dict Char Int

updateCount : Maybe Int -> Maybe Int
updateCount countM =
  case countM of
    -- no previous count of letter, set to 1
    Nothing ->
      Just 1

    -- saw same letter again, increment
    Just i ->
      Just (i + 1)

updateAnalysis : Char -> WordAnalysis -> WordAnalysis
updateAnalysis char dict =
  Dict.update (Char.toLower char) updateCount dict

analyzeWord : String -> WordAnalysis
analyzeWord word =
  String.foldr updateAnalysis Dict.empty word

isAnagram : String -> String -> Bool
isAnagram word1 word2 =
  analyzeWord word1 == analyzeWord word2


-- test functions

anagramTests : List (String, String)
anagramTests =
  [ ("stressed", "desserts")
  , ("stressed", "deserts")
  , ("marching", "charming")
  , ("happy", "sad")
  ]

renderTest : (String, String) -> Html msg
renderTest (word1, word2) =
  div []
    [ u [] [ text word1 ]
    , text " and "
    , u [] [ text word2 ]
    , if isAnagram word1 word2 then
        span
          [ style [ ( "color", "green" ), ("font-weight", "bold") ] ]
          [ text " are anagrams" ]
      else
        span [] [ text " are not anagrams" ]
    ]

main =
  div [] ( List.map renderTest anagramTests )
Collapse
 
nektro profile image
Meghan (she/her) • Edited
function isAnagram(a,b) {
    const vals = [a,b].map(x => x.split('').sort());
    return vals[0].every((v,i) => v === vals[1][i]);
}

Alternatively,

function isAnagram(a,b) {
    return new Set([a,b].map(x => x.split('').sort().join(''))).size === 1;
}
Collapse
 
ripsup profile image
Richard Orelup • Edited

My quick PHP one

<?php

function isAnagram($word1, $word2) {
  $word1Array = str_split($word1);
  $word2Array = str_split($word2);
  sort($word1Array);
  sort($word2Array);

  return $word1Array === $word2Array;
}

var_dump (isAnagram("test","etst")); //returns bool(true)
var_dump (isAnagram("test","ftst")); //returns bool(false)

?>


Collapse
 
mkgiles profile image
Conor James Giles • Edited

Scheme R5RS using srfi-13:
First we define a LUT of the first 26 primes.

(define primes '(2 3 5 7 11 13 17 19 23 29 31 37 41 43 47 53 59 61 67 71 73 79 83 89 97 101))

Next we define our function:

(define (isAnagram a b) (let ((mult (lambda (l) (apply * (map (lambda (x) (list-ref primes (- (char->integer x) 97))) (string->list (string-downcase l))))))) (= (mult a) (mult b))))

To break it down, our function takes two string arguments a and b, and applies the internal function 'mult' to both.
Taking a closer look at 'mult':

(mult (lambda (l) (apply * (map (lambda (x) (list-ref primes (- (char->integer x) 97))) (string->list (string-downcase l))))))

It's a lambda expression which takes a string 'l', downcases it, converts it to a list of chars, and maps an anonymous lambda function across that list, then multiplies the resulting list of integers.
Taking a closer look at the anonymous lambda:

lambda (x) (list-ref primes (- (char->integer x) 97))

It quite simply finds the prime number corresponding to the code-point of each character in the list, minus 97 (the code-point for 'a', we can do this because of the earlier downcasing), a number which corresponds to each letter's ordinal value in the alphabet if a corresponds with 0.