Bob Lied

Posted on

# PWC 234 Same Circus, Different Clowns

Perl Weekly Challenge 234 is with us. This time we have two complementary challenges: the first is to find things that are the same, and the second is to count things that are different.

``````You are given an array of words made up of alphabetic
characters only.

Write a script to return all alphabetic characters that
show up in all words including duplicates.
``````

#### Example 1

Input: @words = ("java", "javascript", "julia")
Output: ("j", "a")

#### Example 2

Input: @words = ("bella", "label", "roller")
Output: ("e", "l", "l")

#### Example 3

Input: @words = ("cool", "lock", "cook")
Output: ("c", "o")

### Thoughts

This turned out to be a surprisingly rich problem.

At first glance, it's set intersection. A language with native support for sets would be keen for this. Perl is not such a language, but it's easy to do set-like things with hashes, and there are modules, of course. However, the repetition of letters and the requirement to show the letters in the order they're encountered will be at odds with the usual definition of sets, where elements are unique and unordered.

We note that the answer depends on the order of words in the input. In example 2, the result should be `qw(e l l)` because we start with "bella". If the list of words started with "label", the result should be `qw(l e l)`.

#### Cute, but probably not

There is a cute way to do set intersection using a cross product of logical operations. For instance, to find the common letters between "perl" and "rasp", we can treat each as a vector and make a matrix that shows whether each pair of letters is the same. Any row that has a sum of 1 will be a common letter.

r a s p Ʃ
p 0 0 0 1 1
e 0 0 0 0 0
r 1 0 0 0 1
l 0 0 0 0 0

This approach is going to be rejected for two reasons: first, it won't work for repeated letters -- deal breaker -- and second, it's O(N^2) for every pair, which has a bad smell.

#### String world

There's an implementation which follows the problem statement literally. For each letter in the first word, check if it exists in each subsequent word. If it does, stash it aside and delete it from the subsequent words.

Let's sketch this out in code. First, let's pick off our start word and reduce the `@words` list to what's left.

``````my \$start = shift @words;
``````

Then, we loop over the letters in `\$start` with a common idiom that uses `split`. (Javascript and Python stole split from Perl, but I'm not bitter.)

``````for my \$letter ( split(//, \$start) ) { . . . }
``````

Now, let's take the description literally and see if the letter occurs in all the remaining words. `List::Util::all` is right there for us, and it has the nice effect that it stops once the condition fails, so it won't necessarily scan the entire list every time.

``````    if ( all { index(\$_, \$letter) >= 0 } @words )
``````

I've used the `index` function here, because it's fast and simple. I briefly considered using a regular expression match, but that's using a tank where a bicycle would do.

So, now we know whether the letter is common. If it isn't, we can go on to the next letter. If it is, then we've used it up in every word, so let's delete it. The `substr` has an optional fourth argument that does string substitution, which I for one found surprising when I first learned of it.

``````    push @result, \$letter;
for my \$w ( 0 .. \$#words)
{
my \$pos = index(\$words[\$w], \$letter);
substr(\$words[\$w], \$pos, 1, ""); # Remove the letter
}
``````

There's another little optimization we can add: if we've used up all the letters in any word, then we can stop looking for any more common letters. I'm going to do that by labeling the `for` loop and breaking out with a `last` statement. The whole function comes together like this:

``````sub commonCharacters(@words)
{
use List::Util qw/all/;

return [] unless @words;
my \$start = shift @words;
my @result;
LETTER:
for my \$letter ( split(//, \$start) )
{
if ( all { index(\$_, \$letter) >= 0 } @words )
{
# This letter occurs in all words.
push @result, \$letter;

# Remove the letter from each word
for my \$w ( 0 .. \$#words)
{
my \$pos = index(\$words[\$w], \$letter);
substr(\$words[\$w], \$pos, 1, "");
last LETTER if \$words[\$w] eq "";
}
}
}
return \@result;
}
``````

I'm a little annoyed here that `index` gets called again in the deletion step -- we just did that! But trying to cache that somehow seems like it will add more overhead than just making the function call.

#### Counting common characters

Another approach is to check the frequency of characters in each word. If our start word has two 'a' characters, then to be common, every other word must have at least one, but no more than two. It answers the question of commonality, but the ordering will have to be figured out.

For this solution, we'll need to determine letter frequency. Last week, I used `List::MoreUtils::frequency`, but the problem really is almost trivial, so this week, let's write our own subroutine for a frequency hash:

``````sub letterFreq(\$word)
{
my %freq;
\$freq{\$_}++ for split(//, \$word);
return \%freq;
}
``````

Our solution will have a similar structure to the previous one. We'll use the first element of `@words` to find the initial set of possible letters. Then we'll use each subsequent word to retain or delete the letter. We'll either run out of letters or out of words, and what's left will be the answer.

The starting point will be a little different, though. We'll put the letters into a frequency hash, and the possible letters will be the keys of the hash.

``````    my \$start = shift @words;
my \$letters = letterFreq(\$start);
while ( @words and keys %{\$letters} )
``````

Now we'll iterate over the words. We'll convert each one to a frequency hash of its own, and then compare the letter counts in the start word to the letter counts in the word. If the letter doesn't even exist in the word, it can be removed from the possibilities. If it does exist, the number of times it's common will be the minimum of the two words.

``````    my \$start = shift @words;
my \$letters = letterFreq(\$start);
while ( @words and keys %{\$letters} )
{
# Use up words, make frequency hash
my \$wfreq = letterFreq(shift @words);

for my \$char ( keys %{\$letters} )
{
if ( exists \$wfreq->{\$char} )
{
\$letters->{\$char} = min(\$letters->{\$char}, \$wfreq->{\$char});
}
else
{
delete \$letters->{\$char};
}
}
}
``````

I feel better about this solution. We touch every word only once, and we reduce the problem to a smaller set of letters and words with every loop iteration; performance should be linear in proportion to the number of words.

C++ programmers will sniff a bad smell here: we've deleted from the `\$letters` hash while iterating over it. In C++, that would invalidate the iterator, and lead to an afternoon or evening of frustrating debugging (ask me how I know). Perl, however, has no such problem when `keys` is used to set up a list of things to iterate over.

There is a remaining problem, though. When we finish the `while` loop, the common characters are in the `\$letters` hash, but they're unordered. To solve the problem, we need to put the common characters in the same order as the start word. Good thing we saved `\$start` at the beginning.

To mix it up, let's use array operations instead of `for` loops this time. We're going to start by mapping each character in `\$start` to whether or not it exists in `\$letters`. Then we'll select (think `grep`) the ones that exist. In code, that will read right to left.

``````    return [ grep { defined }
map { ( exists \$letters->{\$_} && \$letters->{\$_} > 0 )
? do { \$letters->{\$_}--; \$_ }
: undef  }
split(//, \$start) ]
``````

The repeated letters raise their complicated little heads here. The `\$letters` frequency hash has the number of recurrences that can happen. We can only accept that many from the `\$start` word. Decrementing the `\$letters` value to zero takes care of that.

The somewhat-rare `do BLOCK` statement shows up here because we want to execute two statements in a place where ordinarily only a single expression could appear. Don't confuse the `do` in Perl with the `do-while` in other languages.

``````You are given an array of positive integers.

Write a script to find the number of triplets
(i, j, k) that satisfies num[i] != num[j],
num[j] != num[k] and num[k] != num[i].
``````

#### Example 1

Input: @ints = (4, 4, 2, 4, 3)
Output: 3

(0, 2, 4) because 4 != 2 != 3
(1, 2, 4) because 4 != 2 != 3
(2, 3, 4) because 2 != 4 != 3

#### Example 2

Input: @ints = (1, 1, 1, 1, 1)
Output: 0

#### Example 3

Input: @ints = (4, 7, 1, 10, 7, 4, 1, 1)
Output: 28

triplets of 1, 4, 7 = 3x2×2 = 12 combinations
triplets of 1, 4, 10 = 3×2×1 = 6 combinations
triplets of 4, 7, 10 = 2×2×1 = 4 combinations
triplets of 1, 7, 10 = 3x2x1 = 6 combinations

### Thoughts

The first thought is brute force: let's make every combination of three things and see if they're different. For arrays as small as the example, that would be adequate, but that's going to be an O(N^3) algorithm.

Example 3 gives the game away. We don't need to enumerate the combinations, we just have to calculate the number of possibilities. This just got a lot easier.

Again we're going to have a frequency hash. Not only will that help with the calculation, the keys of the hash will all be different, trivially giving us n[i] ≠ n[j] ≠ n[k].

Let's go to the code, Bob. First, we already saw above how to make a frequency hash. Let's do that, and then make an array of the keys.

``````    my %freq; \$freq{\$_}++ for @num;
my @n = keys %freq;
``````

Getting triplets out of `@n` can be done with an idiomatic triple loop that covers every combination of indices once.

``````      0 | 1 | 2 | 3 | 4 | 5 | . . . | n-3 | n-2 | n-1
i: --------i----------------------->:
j:             j------------------------->:
k:                 k--------------------------->;
``````

The i index starts at zero and goes not quite up to the end, leaving one for j and one for k. The j index starts to the right of i, and ends where it leaves one element for k. The k index moves to the right of j and ends at the end of the array.

With i, j, and k in hand, we merely need to multiply the counts from the frequency hash.

``````    my \$count = 0;
for ( my \$i = 0; \$i <= \$#n - 2 ; \$i++ )
{
for ( my \$j = \$i+1; \$j <= \$#n-1; \$j++ )
{
for ( my \$k = \$j+1 ; \$k <= \$#n; \$k++ )
{
\$count += \$freq{\$n[\$i]} * \$freq{\$n[\$j]} * \$freq{\$n[\$k]};
}
}
}
return \$count;
``````