<< Week 4: Binary Search | View Solution on Github | Week 6: Misc 3 >>
(Image Source: Buzzfeed) Despite everything, I'm still just a worm on a string.
Happy Monday, and welcome back to Whiteboarding in Python. I hope last week I didn't scare you away with algorithms and math, so this week, we'll look at an easier problem involving strings. Let's jump right in.
# Determine if a string has all unique characters.
# You cannot use additional data structures.
Pretty simple, right? That's because it's the first question listed in the "Arrays and Strings" section of Cracking the Coding Interview. There are many different approaches we can take that differ in time and space complexity, so we'll walk through a couple of them.
First Approach: Brute Force
The brute force is usually the simplest way to solve any whiteboarding problem, so I find it helpful to come up with that solution first before moving onto more sophisticated methods. At least then, we have a solution to fall back on if the fancier methods don't work. And, it's good practice for computer science fundamentals. Reading through this solution will help give some clarity about how to start approaching any problem you may come across.
"Brute force" is exactly as it sounds. Imagine you had a jigsaw puzzle where all the pieces were the same color. You'd have to blindly try every combination until you found two that fit together. Here, we're going to check every character to every other character in the string to see if it's a match. Let's start by defining the method (either in Repl.it, or you could try your hand at old fashioned pencil and paper). It will take one string as a parameter, which we'll call s
.
def is_unique(s):
pass
Next, we'll define the first for
loop. There will be nested loops, and this one will loop through every character in the string until the second to last. Why not until the last character? We'll be comparing each character to the ones following it, and there are none after the last character in the list.
def is_unique(s):
for i in range(len(s) - 1):
Now let's make a for
loop inside that one. It will starts with the next spot in the list, which we'll call index j
. Index j
will traverse the list until the end, comparing it to the character at the first index, i
. We'll return False if they match, meaning the characters in the string are not unique.
def is_unique(s):
for i in range(len(s) - 1):
for j in range(i + 1, len(s)):
if s[j] == s[i]:
return False
Finally, we need to return True somewhere. If the nested for
loop successfully compares each letter together and never finds a match, that means the strings are unique. Thus, we'll return True outside the nested loop.
def is_unique(s):
for i in range(len(s) - 1):
for j in range(i + 1, len(s)):
if s[j] == s[i]:
return False
return True
And that's it! Calling is_unique("Hello")
should return False, and is_unique("Bye")
should return True.
Time and Space Complexity of the Brute Force Method
For one, our solution fills the requirement of "no additional data structures." We simply loop through the string without saving information to a new data structure. That gives us a space complexity of O(1), having no relation to the length of the string.
How about time complexity? To imagine the worst case, there are no unique characters, so the entirety of the nested loop has to run. This we can assume to be about O(N2), although we saved some time by checking each pair only once. I did the math, and it's technically O( ∑N-1i=1 i ), but this is not a typical time complexity, so for now we'll say it's O(N2). If you remember the graph from last week, O(N2) is pretty terrible. However, this is about the best we can do if we can make no new additional data structures or modify the original string.
More Optimal: Sort the String
Now that have brute force out of the way, let's try to come up with some more elegant solutions. We started to learn about searching and sorting last week, so let's think about how we could use those concepts here. If we sort the string, we could loop over every character and make sure it doesn't match its previous character. Let's get started.
The Python .sort()
method only works on lists. So, our first task will be to turn the string into a list of characters. I'll reiterate, but for whiteboarding problems in Python, this is really something you'll want to know by heart.
s_as_list = [char for char in s]
The for
loop iterates each character in the string s
, and returns that character. We cast the final product to a list by wrapping it in square brackets (a couple of square boys, as my students call them).
Once we turn the string into a list, we can call sort()
on it.
def is_unique2(s):
s_as_list = [char for char in s]
s_as_list.sort()
Now, we can loop through the list. We'll compare each letter to the last, and we can do this in a couple of ways. We can iterate over each index, starting at index 1 (if it exists) until the end, or, if we're not keeping track of the index, we can save the previous letter to a variable. We can initialize it to an empty string.
prev = ""
for letter in s_as_list:
For each iteration, we want to do one of two things:
- If the character is the same as the previous one, return False.
- Otherwise, make that character the new previous character.
We can turn this into an if
statement. I'm using the variable name letter
instead of char
just so we don't confuse it with where we used it earlier, when we made the string into a list.
prev = ""
for letter in s_as_list:
if letter == prev:
return False
else:
prev = letter
Finally, if the for
loop was executed successfully without finding a match, we'll return True outside of the loop. Altogether:
def is_unique2(s):
s_as_list = [char for char in s]
s_as_list.sort()
prev = ""
for letter in s_as_list:
if letter == prev:
return False
else:
prev = letter
return True
Time Complexity for Method Using Sort
The complexity of our method depends on the time complexity of Python's .sort()
method. Python uses Timsort, which is a hybrid between a merge and insertion sort, if you're familiar with those. The average and worst case complexity is O(N log N). Additionally, we loop through the list N times, although we neglect this since O(N log N) is larger. However, since we have to make a new list to sort the string, we no longer fill the requirement that additional data structures cannot be used.
Even More Optimal Time Complexity: Use a Dictionary
Let's throw out the "no additional data structures" rule altogether. What solution could we come up with? In Cracking the Coding Interview, the first solution suggests using an array of length 128, the lenght of the Unicode alphabet. However, if you've used Python once or twice, by now you probably know to use a dictionary.
In our third week, we used the Python library default dictionary. If you remember, defaultdict
allows us to set a default type for values contained in a dictionary, so if we call a key that doesn't exist, it will give us the falsey version of that type. This saves us the hassle of having to check if a key is in the list first before checking its value. We can set our dictionary dd
to type bool
so that the default value is always False.
from collections import defaultdict
def is_unique3(s):
dd = defaultdict(bool)
Next, let's write the for
loop. We'll iterate over every character in the string, and if its value is not False, meaning that character has been seen before, we'll return False. Otherwise, we'll set the dictionary value at that character to be True.
for char in s:
if dd[char]:
return False
dd[char] = True
Finally, we'll return True outside of the loop if every character appears only once. Altogether:
def is_unique3(s):
dd = defaultdict(bool)
for char in s:
if dd[char]:
return False
dd[char] = True
return True
Time Complexity for the Dictionary Method
What is our time complexity? We loop through the list N times, and it takes O(1) to access each item in the dictionary, so our complexity is O(N), even better than our previous approach. However, now we have a dictionary in our storage that wasn't there before. You'll find that time and space are tradeoffs when it comes to computer science, and making the best decision depends on the situation.
That's all for today, see you next week! If you have questions, or any ideas for what problem to tackle next, feel free to leave a comment.
<< Week 4: Binary Search | View Solution on Github | Week 6: Misc 3 >>
Sheamus Heikkila is formerly a Teaching Assistant at General Assembly Seattle. This blog is not associated with GA.
Top comments (2)
Awesome article! Here's a performance golf challenger. O(1) space complexity and O(n) runtime complexity like the dictionary solution, but it avoids the setup cost and hashing cost of a dictionary. It'll be a lot faster if we're checking many small strings. This is biased towards more bare-bones languages like C/++, but should work well for Python too. Of course, it assumes ASCII strings - the same solution for unicode would require too much space and a dictionary would be faster in that case.
Another approach, throwing out the "no other data structures" rule, is to strip the string down to letters-only, convert to a set, and compare lengths: