If your girlfriend has been kidnapped by Gollum and the Evil One, and there's nothing you can do, perhaps console yourself with a small programming problem while you listen to Ramble On or Ramblin' Man'.
PWC 370 Task 2: Scramble String
The Task
You are given two strings A and B of the same length. Write a script to return true if string B is a scramble of string A otherwise return false. String B is a scramble of string A if A can be transformed into B by a single (recursive) scramble operation. A scramble operation is:
- If the string consists of only one character, return the string.
- Divide the string X into two non-empty parts.
- Optionally, exchange the order of those parts.
- Optionally, scramble each of those parts.
- Concatenate the scrambled parts to return a single string.
Examples
-
Input:
$str1 = "abc",$str2 = "acb"Output:true- split: ["a", "bc"]
- split: ["a", ["b", "c"]]
- swap: ["a", ["c", "b"]]
- concatenate: "acb"
-
Input:
$str1 = "abcd",$str2 = "cdba"Output:true- split: ["ab", "cd"]
- swap: ["cd", "ab"]
- split: ["cd", ["a", "b"]]
- swap: ["cd", ["b", "a"]]
- concatenate: "cdba"
-
Input:
$str1 = "hello",$str2 = "hiiii"Output:false -
Input:
$str1 = "ateer",$str2 = "eater"Output:true -
Input:
$str1 = "abcd",$str2 = "bdac"Output:false
The Ramblin' Scramblin' Thinkin' Part
This could go two ways: (1) generate every possible scramble of str1 and then look for str2; or (2) do a breadth-first or depth-first search for str2.
The possible number of scrambles could explode. For a length of n, there will be n-1 splits, and then the n-1 string will have n-2 splits, and so on, so we're looking at factorial complexity. Let's not generate every possible scramble unless we have to.
For a search, we'd be trying to match the front or back halves of the two strings, and then recursing on the parts that didn't match. There are finite possibilities for partial matches.
- A simple swap gets us from
str1tostr2:
str1: --- ++++++++++
str2: ++++++++++ ---
- The head or tail of
str1matches the corresponding head or tail of str2, requiring us to recurse on the shorter string that didn't yet match.
str1: === [??????????] || [???] ==========
| scramble scramble |
| | | |
str2: === [.*#.*#.*#.] || [.*#] ==========
- The head or tail of
str1matches the back or front ofstr2. Similar to the case above, except that a swap is required before we recurse.
str1: === [??????????] || [???] =========
/ \
scramble scramble
/ \
str2: [.*#.*#.*#.] === || ========= [.*#]
A Bold Strategy, Cotton. Let's see if that works out for him.
After an embarrassing amount of trial and error (you might say I was scrambling), my solution converged on this longer-than-usual recursive function.
sub isScramble($str1, $str2, $depth = "")
{
state %remember;
%remember = () if $depth eq "";
my $key = ":$str1:$str2:";
if ( exists $remember{$key} )
{
$logger->debug("${depth}CACHED $key = ", ($remember{$key} ? "true":"false"));
return $remember{$key}
}
my $len = length($str1);
return ( $remember{$key} = false) if $len != length($str2);
return ( $remember{$key} = true) if $str1 eq $str2;
for ( 1 .. $len-1 )
{
# $_ is the length of the left substring (head), $len-$_ is length of right (tail)
my ( $head, $tail) = ( substr($str1, 0, $_), substr($str1, $_) );
my ($s2head, $s2tail) = ( substr($str2, 0, $_), substr($str2, $_) );
my $s2front = substr($str2, 0, $len-$_); # length of $tail, on left side
my $s2back = substr($str2, -$_); # length of $head, on right side
$logger->debug("${depth}Compare [$head/$tail] <> s2head/$s2tail] ($s2front/$s2back)");
if ( "$tail$head" eq $str2 )
{
$logger->debug("${depth}FOUND $str2");
return $remember{$key} = true;
}
elsif ( $head eq $s2head )
{
$logger->debug("${depth}HH, compare $tail <> $s2tail");
return true if $remember{$key} = isScramble($tail, $s2tail, " $depth");
}
elsif ( $head eq $s2back )
{
$logger->debug("${depth}HB, compare $tail <> s2front");
return true if $remember{$key} = isScramble($tail, $s2front, " $depth");
}
elsif ( $tail eq $s2tail )
{
$logger->debug("${depth}TT, compare $head <> s2head");
return true if $remember{$key} = isScramble($head, $s2head, " $depth");
}
elsif ( $tail eq $s2front )
{
$logger->debug("${depth}TF, compare $head <> s2back");
return true if $remember{$key} = isScramble($head, $s2back, " $depth");
}
else
{
$logger->debug("${depth}No pairs, recurse with $head<>$s2head and $tail<>$s2tail");
return $remember{$key} = true if (
( isScramble($head, $s2head, " $depth")
&& isScramble($tail, $s2tail, " $depth") )
|| ( isScramble($head, $s2back, " $depth")
&& isScramble($tail, $s2front, " $depth") ) );
}
}
$logger->debug("${depth}NOT FOUND $key = false (caching)");
return $remember{$key} = false;
}
Notes:
-
The function starts out with setting up a cache for partial results. During development, I added this near the end of the process, but it shows up at the top of the function, so let's discuss it.
- Recursive search algorithms usually benefit from caching known results. Our worst case is that
str1can't be scrambled intostr2, in which case we'll end up doing the whole exponential tree of possibilities. I expect caching partial results would pay off. - The
%rememberhash is my cache, and it's astatevariable so that it persists across recursive calls. But that means it also persists across invocations for different test cases, so it could yield wrong answers unless I initialize it at the top level of the recursion. - The cache is filled by using in-line assignment every time a result is returned, which is kind of cutely readable, I think.
- Should I have used
Memoize? Maybe. But that caches values based on the arguments passed to the function, and one of my arguments is thedepthto which I've recursed. That means cache misses depending on how I got down the search tree, so I opted for do-it-yourself memo-ization.
- Recursive search algorithms usually benefit from caching known results. Our worst case is that
The function begins by dispensing with the easy cases: having a result in the cache, matching strings, and strings of the wrong length.
Then, for each division of
str1, let's have at hand the substrings that we might be dealing with from bothstr1andstr2. There's a little tour-de-force ofsubstruses here, splitting the strings at a midpoint, and using negative offsets.Most of the body of the function is testing which pairs match up, if any, and doing the appropriate recursion of what remains.
The last case, where neither the head nor the tail of
str1has a matching complement instr2, requires us to recurse on both pieces, and in either order.I used logging liberally, because the function is only obvious in hindsight and took me longer than I'd like to admit. The
$depthargument is mostly a convenience for indenting the log statements appropriately -- notice that it gets longer by two spaces at each recursion. My setup for logging usesLog::Log4perlin easy mode. It's part of my boilerplate for PWC solutions and looks like this:
use Getopt::Long;
my $Verbose = false;
my $DoTest = false;
GetOptions("test" => \$DoTest, "verbose" => \$Verbose);
my $logger;
{
use Log::Log4perl qw(:easy);
Log::Log4perl->easy_init({ level => ($Verbose ? $DEBUG : $INFO ),
layout => "%d{HH:mm:ss.SSS} %p{1} %m%n" });
$logger = Log::Log4perl->get_logger();
}
Top comments (0)