I started a series on JavaScript solutions to common algorithms and javascript problems. In case you missed the first one, here's a link to it. Earlier this week, I wrote an article on the Big O notation. If you are not familiar with it, you might want to read it as some concepts are used in this article. Let's go straight to the problem statement.
Finding Anagrams - the problem
Anagrams are words that have the same characters in the same quantity. This means that two strings are anagrams if we can rearrange one to get the other.
Here are some examples of words that are anagrams.
- “listen” and “silent”
- “rail safety” and “fairy tales”
- “dormitory” and “dirty room”
- “the eyes” and “they see”
To solve this problem, we will assume the following:
- That we ignore extra characters like “!”, “@”, etc. and whitespaces.
- We only want to work with lowercase characters.
Let us look at some solutions to this problem. Then we'll compare each of them based on their time complexity.
Solution 1 - Create a character map of both strings and compare maps
A character map in this context is a map or object that contains each unique character in the string. It stores the character as a key and the number of times it occurs in that string as the value.
function anagrams(stringA, stringB) {
/*First, we remove any non-alphabet character using regex and convert
convert the strings to lowercase. */
stringA = stringA.replace(/[^\w]/g, "").toLowerCase()
stringB = stringB.replace(/[^\w]/g, "").toLowerCase()
//Get the character map of both strings
const charMapA = getCharMap(stringA)
const charMapB = getCharMap(stringB)
/* Next, we loop through each character in the charMapA,
and check if it exists in charMapB and has the same value as
in charMapA. If it does not, return false */
for (let char in charMapA) {
if (charMapA[char] !== charMapB[char]) {
return false
}
}
return true
}
function getCharMap(string) {
// We define an empty object that will hold the key - value pairs.
let charMap = {}
/*We loop through each character in the string. if the character
already exists in the map, increase the value, otherwise add it
to the map with a value of 1 */
for (let char of string) {
charMap[char] = charMap[char] + 1 || 1
}
return charMap
}
The runtime complexity of a for loop is linear i.e O(n). In this case, there are 3 consecutive forloops which are not nested. Ignoring constants and other factors, the time complexity is approximately linear i.e. O(n).
2. Sort Strings and check if they are the same
This is a shorter and neater way of checking if two strings are anagrams.
In this case, we convert the string to an array, use the Array.sort()
method to sort it and convert it back to a string. Then we compare both strings and check if they are the same.
function anagrams(stringA, stringB) {
/*First, we remove any non-alphabet character using regex and convert
convert the strings to lowercase. */
stringA = stringA.replace(/[^\w]/g, '').toLowerCase()
stringB = stringB.replace(/[^\w]/g, '').toLowerCase()
return sortString(stringA) === sortString(stringB)
}
/*This function sorts the strings*/
function sortString(string) {
return string.split('').sort().join('');
}
Array.sort uses merge sort so its time complexity is O(nlogn).
3. Using Array.splice()
This is yet another solution. In this case, we convert string B to an array, loop through each character in string A and check if it exists in an array of string B, arrB
. If it exists, we use the Splice method to remove it from the array. We do this so that characters that occur more than once in arrB
are not checked twice.
function anagrams(stringA, stringB) {
/*First, we remove any non-alphabet character using regex and convert
convert the strings to lowercase. */
stringA = stringA.replace(/[^\w]/g, '').toLowerCase()
stringB = stringB.replace(/[^\w]/g, '').toLowerCase()
/*Next, we check if the lengths of the strings are equal.
If they are anagrams, they will have the same length. */
if (stringA.length !== stringB.length) {
return false
}
let arrB = stringB.split("")
for (let char of stringA ){
if (!arrB.includes(char)) {
return false
break;
} else {
arrB.splice(arrB.indexOf(char), 1)
}
}
return true
}
So let's consider the time complexity of this solution. In this case, there are three loops that run. The for
loop, the includes
loop and the splice
loop. Since the splice
loop and the includes
are not nested, the time complexity tends to O(n^2 ).
Conclusion
We have seen the solutions and their approximate time complexities. Comparing their time complexities, the first solution seems to have better performance. It has an approximate time complexity of O(n). The second solution, however, is more concise. So you can choose any solution depending on what is more important to you.
Got any question or addition? Please leave a comment.
Thank you for reading.
Top comments (33)
It's Python (so, off-topic), but I couldn't help myself:
O(n), if you're wondering.
Flawed, would fail for strings like ('aabb', 'aaab')
Good catch. I'd overlooked that edge case. Let me ponder....
AHA! The most obvious solution is...
sorted()
employs Timsort, which has a worst case ofO(n log n)
. Everything else is still onlyO(n)
.Naturally, I'm sure there's a more efficient solution if I chew on this longer, but that'd be the one I'd probably use in production; reasonable performance, readable, maintainable, etc.
EVEN BETTER! I just thought of a completely
O(n)
solution that relies on the commutative property of addition, and the fact that any letter would have a unique numeric equivalent (retrieved viaord()
).Ahhh, I love a good hack.
Nope, this is flawed too. There are multiple ways to express a sum. 3+2=5 and 4+1=5 as well.
So ord('a') + ord('z') = ord('b') + ord('y')
Would fail for strings such as ('az', 'by').
Hmm, hadn't considered that angle....
Prime factorizations are unique so if you map each letter to a prime and then take the product of those primes, you'd have a unique representation of the characters in the word. See this classic tweet for a better explanation.
Not very efficient or fast, however.
I mean its time complexity is
O(n)
-ish but yeah the large number multiplication would get you for longer words.26th prime is 101 so I imagine zzzzzzzzzzzzz et all would take more time
It's not efficient, but I wonder if a Cantor pairing of the
ord()
values would work. It would certainly be optimal if all the strings were two letters. The trouble is, those numbers get very big, very fast.What do you think?
Copyright (c) 2019 Caleb Pitan
😇😁😁
I think that function expressions using ! are there just to make the code less readable.
My bad! I won't lie to you I really don't know the use of the "!", but I think it's to indicate an
iife
inside the function, hence aid readability. I put it because in my early years I saw a lot of these around, and I thought having it around or not affects nothing. Didn't know it affects readability, although I don't understand how. I wouldn't mind if you told me.I prefer explicit to implicit (comes from enjoying Python too much...) and
!(function(){...}();
is just... well, if you haven't seen it before you have no idea what the gosh darnit is going on.digitalfortress.tech/js/exclamatio...
I initially thought about that approach, but discarded it for an obvious reason:
Smart 😃.
It's just too obvious. Didn't take time to analyze it. It struck my mind as I saw this post and I put it down.
I thought about using \W instead of ^w as you wrote)
There's really no difference, it's just left to choice.
Know it. But I think this way is more elegant and perfomance
Solution 1 might not be completely correct if you don't check that the length of s and t are the same otherwise return false like so:
Otherwise this test will return true even though it should be false
My first solution (ES2019) looks like this:
Here's an O(∞) solution inspired by bogosort. If it halts,
a
andb
are anagrams.My intuition, before i saw any of the solutions, went into the 2nd one because it feels like it would be the easiest to understand and the fastest. I wonder if thats the case ;)
Thank you for the brain teaser, good beginning of the day :)
In order to remove spaces, you are calling string.replace once for every occurrence of " " (space) in each string. An easier way to remove whitespace in a string is to use a regular expression object with a global modifier, which will replace all matching characters in the string. Then, you can get rid of your while-loops, and your code becomes slightly easier to read. I hope it will help you to complete your Anagram checker project .
You can also use the /\s/g regular expression object to replace all whitespace including tabs, newlines, etc.
const same = (s1, s2) => { const str1 = s1.toLowerCase(); const str2 = s2.toLowerCase(); if (str1 === str2) return true; if (str1.length !== str2.length) return false; const obj1 = {}; const obj2 = {}; str1.split('').forEach((s, i) => { obj1[str1[i]] = (obj1[str1[i]] || 0) + 1; obj2[str2[i]] = (obj2[str2[i]] || 0) + 1; }); const arr1 = Object.entries(obj1).map(String).sort().toString(); const arr2 = Object.entries(obj2).map(String).sort().toString(); return arr1 === arr2; }; same('textwisttime', 'timewisttext');
tried all the above and the below solutions but the least execution time was taken by this one.
Great article Sarah! Mind sharing any resources for leveling up on Data Structures and Algorithms. Thanks
I'm taking a course by Stephen Grider on Udemy. Another recommended course is by Colt Steele - JavaScript Algorithms and Data Structures Masterclass.