DEV Community

Bob Lied

Posted on • Updated on

PWC 208 Smells like teen SQL

Minimum Index Sum

Perl Weekly Challenge Task 1, Minimum Index Sum, asks us to match elements from a pair of lists and decide which matches have a minimum sum.

You are given two arrays of strings.
Write a script to find out all common strings in the
given two arrays with minimum index sum. If no common
strings found returns an empty list.

``````Example 1

Input: @list1 = ("Perl", "Raku", "Love")
@list2 = ("Raku", "Perl", "Hate")

Output: ("Perl", "Raku")

There are two common strings "Perl" and "Raku".
Index sum of "Perl": 0 + 1 = 1
Index sum of "Raku": 1 + 0 = 1
``````

This reminds me of the sorts of things that programmers do when they don't want to write SQL. "JOIN what to what now? And where does the MIN function go? If I just had those lists in variables ... I'm already using DBI to talk to the database, what the heck, I'll just fetch the data and do it myself." Ah well, we seem to be past that point here and have already rationalized our way to data in lists, so let's begin from there.

``````-- Nah, too complicated
SELECT a.word
FROM LIST1 AS a JOIN LIST2 AS b USING (word)
WHERE a.idx + b.idx =
( SELECT MIN(c.idx + d.idx)
FROM LIST1 AS c JOIN LIST2 AS d USING (word)
);
``````

First of all, the arrays give us a handy mapping from index to value, but we're going to need the other direction. Value to index is going to be a hash. We could go explore CPAN for modules that implement sets (what we're looking for here, after all, is set intersection), but let's stick to core Perl.

Let's invert an array into a hash, where the keys are the elements of the array, and the values are the corresponding indexes. There's a cute way to exploit hash slices to do it:

``````my %h ; @h{@list} = ( 0 .. \$#list );
``````

That's concise, but what if there are duplicate values in the list? It puts the higher index into the resulting hash, and that's clearly not going to be the right answer to get a minimum sum. To handle the possibility that the element is repeated in the list, we need to be a bit more explicit.

``````sub asHash(\$list)
{
my %h;
while ( my (\$i, \$val) = each @\$list )
{
\$h{\$val} = \$i unless exists \$h{\$val};
}
return \%h; # NB: returns reference
}
my (\$h1, \$h2) = ( asHash(\$list1), asHash(\$list2) );
``````

Here I'm using the `each` function to attach names to pairs of index and value, and therefore avoid repeatedly having to de-reference `\$list` within the loop. `each` can be fragile (check the warnings and advice in `perldoc -f each`), but it's fine here in a well-contained context. I've also been explicit with the logic here by using `unless`, but for added obfuscatory fun, I could have used the `//=` operator instead.

On to the matching of parts, then. We can select things that are in both lists by walking one hash and looking it up in the other. If we were dealing with big lists, it might be worth optimizing to loop over the shorter list, but we'll skip the complication of figuring out which that is.

``````grep { exists \$h2->{\$_} } keys %\$h1;
``````

Now we have a list of candidate matches, so let's make a table of the possible sums for each word.

``````my %indexSum = map { \$_ => ( \$h1->{\$_} + \$h2->{\$_} ) } # grep result goes here
``````

Finally, we need the minimum value from this hash, and then to select the keys of the hash that have that minimum. I'm throwing in a `sort` to have a predictable order that can be tested easily. If the `sort` is too painful in production, the test can be upgraded to use a bag as the expected value, so that ordering doesn't matter.

``````my \$min = min(values %indexSum);
return [ sort grep { \$indexSum{\$_} == \$min } keys %indexSum ];
}
``````

The whole thing together:

``````sub minIndexSum(\$list1, \$list2)
{
my (\$h1, \$h2) = ( asHash(\$list1), asHash(\$list2) );
my %indexSum = map { \$_ => ( \$h1->{\$_} + \$h2->{\$_} ) }
grep { exists \$h2->{\$_} } keys %\$h1;

my \$min = min(values %indexSum);
return [ sort grep { \$indexSum{\$_} == \$min } keys %indexSum ];
}
``````

I rather like this solution because of the complete lack of obvious loops. All the things that might have been `for` loops or `while` loops have dissolved away into `grep` and `map` and `min`; and with that, so have all the testing for boundary conditions and empty lists.

Duplicate and Missing

The second task, Duplicate and Missing, asks us to find an anomaly in a list:

You are given an array of integers in sequence with one
missing and one duplicate.

Write a script to find the duplicate and missing integer
in the given array. Return -1 if none found.

For the sake of this task, let us assume the array
contains no more than one duplicate and missing.

``````Example 1: Input: @nums = (1,2,2,4)
Output: (2,3)
Duplicate is 2 and Missing is 3.
Example 2: Input: @nums = (1,2,3,4)
Output: -1
No duplicate and missing found.
Example 3: Input: @nums = (1,2,3,3)
Output: (3,4)
Duplicate is 3 and Missing is 4.</blockquote>
``````

As so often happens, the specification is a little ambiguous. The implication from the examples is that one number in the list has been replaced by duplicating its neighbors. This is what would happen if we were listing the rankings of a league and two of the teams were tied: we would give both of them the same ranking and leave a hole in the sequence.

But it could also be taken literally: there is one integer that is missing, and somewhere else in the list a different integer is duplicated. I didn't find a simple way to handle both interpretations simultaneously, so I wrote separate functions for the two cases.

First, the team-ranking interpretation. In this case, we will expect consecutive integers, and if we find they are not consecutive, then we've found our duplicate-and-missing and can stop right there.

``````sub dupAndMissing(@list)
{
my \$current = shift @list;
while ( my \$next = shift @list )
{
if ( \$next == \$current )
{
return [ \$current, \$current+1 ];
}
\$current = \$next;
}
return [];
}
``````

I've chosen to use `shift` to handle getting consecutive elements of the list. This way seems to terminate the list naturally, and is less cluttered than using a `for` loop and index variables.

The alternative interpretation requires a little more logic, but is not much more difficult. The problem statement has let us off the hook for the complications of more than one duplicate, or a gap bigger than one, so we're going to take that win to keep it simple.

``````sub dupAndMissing_B(@list)
{
my (\$dup, \$missing);
my \$current = shift @list;
while ( my \$next = shift @list )
{
if ( \$current == \$next )
{
\$dup = \$current;
}
elsif ( \$next > \$current + 1 )
{
\$missing = \$current + 1;
}
\$current = \$next;
}
return [] unless defined \$dup && defined \$missing;
return [ \$dup, \$missing ];
}
``````

This loop could also have been terminated early once both the missing and the duplicate have been found. It also raised the possibility of having one and not the other. I chose to require both.

Both of these functions consistently return array references, whether answers are found or not. The problem specifies returning `-1` if not found, so when these functions are actually called, there must be logic around the function call to transform an empty array `[]` to `-1`.

``````my \$retval = dupAndMissing(@ARGV);
say "-1" if @\$retval == 0;
say '(', join(',', \$retval->@*), ')';
``````