Posted on May 20, 2020 • Originally published at leandrotk.github.io

Algorithms Problem Solving: Jewels and Stones

#algorithms #python #computerscience #interview

This post is part of the Algorithms Problem Solving series. And it was originally published at TK's blog.

Problem description

This is the Jewels and Stones problem. The description looks like this:

You're given strings J representing the types of stones that are jewels, and S representing the stones you have. Each character in S is a type of stone you have. You want to know how many of the stones you have are also jewels.

The letters in J are guaranteed distinct, and all characters in J and S are letters. Letters are case sensitive, so "a" is considered a different type of stone from "A".

Examples

# input: J = "aA" | S = "aAAbbbb"
# output: 3

# input: J = "z", S = "ZZ"
# output: 0

Solution

At first, I tried to understand some corner cases. For example, what should I return if the J or the S is an empty string?

As my first solution, the brute force solution, I needed to look through the string. So I didn't need to care about the empty strings. For empty strings, it doesn't loop, it just returns the default counter, in this case, 0.

I just needed to loop through the J and for each character in the J string, I need to compare to each character of S. If they match, I increment the counter.

After looping through each character, just return the final counter.

def num_jewels_in_stones(J, S):
    jewels = 0

    for j in J:
        for s in S:
            if s == j:
                jewels += 1

    return jewels

print(num_jewels_in_stones("aA", "aAAbbbb")) # 3
print(num_jewels_in_stones("z", "ZZ")) # 0

This is a O(N^2) solution in terms of time complexity. Or more precisaly: O(len(J) * len(S)). For the space complexity, it is O(1) as we just store the value in a counter. If len(J) or len(S) increase, the used space keeps constant.

Just to iterate this solution, we can make it O(N) in terms of time complexity by using a hash table to store all characters as key and the counter as value.

def num_jewels_in_stones_opt(J, S):
    chars_counter = {}
    counter = 0

    for char in J:
        if char in chars_counter:
            chars_counter[char] += 1
        else:
            chars_counter[char] = 1

    for char in S:
        if char in chars_counter:
            counter += chars_counter[char]

    return counter

print(num_jewels_in_stones_opt("aA", "aAAbbbb")) # 3
print(num_jewels_in_stones_opt("z", "ZZ")) # 0

So, for each J's character. Verify if it is already in the chars_counter. If it is, just increment it. If not, initialize the counter.

Then, for each S's character, verify if this character is in the chars_counter. If it is, get it and add to the counter variable. If not, do nothing.

After this iteration, we need the final value of the counter. Just return it.

As we said before, it runs in O(N). Better than the first solution. But, for the worst-case scenario, the space complexity is O(N), worse than the first approach.

For more stories and posts about my journey learning & mastering software engineering, take a look at what I've been writing and documenting.

My Twitter & Github. ☺

Resources

Top comments (6)

Musale Martin • Jun 18 '20

Nice article. I wonder what the impact of using the collections.Counter package has on the second solution? So instead of initializing the chars_counter to a dict, you initialize it to the collections.Counter in-built for J's characters.

ArcticSpaceFox • May 30 '20

Cool series, but I wanna solve the problem before reading your solution. Hope to see more coming but no stress 👍

TK • May 30 '20 • Edited

Thanks! I have 10 more posts to publish. I just need to organize them and I'll definitely publish them here :)

ArcticSpaceFox • May 31 '20

Hmm weird, during testing of both algorithms they take the same amount of time, do you see why that is? I test them on 10 Million random chars of aAbB

TK • May 31 '20

I think it is because you are running with short length strings (aAbB). The first solution will be very slow for the worst-case scenario, where the string has a long length.

Imagine this example: strings with 10,000 chars.

The first solution has a nested for. The combination of each character would be 10,000 * 10,000 = 100,000,000.

The second solution has two for, but not nested. It would be 10,000 + 10,000 = 20,000 for the worst-case scenario.

The curve increases differently for each solution. Did it make sense to you?

Just for curiosity: how are you testing the amount of time spent by each algorithm?

ArcticSpaceFox • May 31 '20

I use timeit and measure time before and after. Thank you I will try to modify my file which is tested on