It's been a while since I commented on a Weekly Challenge solution, but here we are at week 360. Such a useful number. So divisible, so circular. It deserves twenty minutes.
Task 2: Word Sorter
The task
You are given a sentence. Write a script to order words in the given
sentence alphabetically but keep the words themselves unchanged.
# Example 1 Input: $str = "The quick brown fox"
# Output: "brown fox quick The"
# Example 2 Input: $str = "Hello World! How are you?"
# Output: "are Hello How World! you?"
# Example 3 Input: $str = "Hello"
# Output: "Hello"
# Example 4 Input: $str = "Hello, World! How are you?"
# Output: "are Hello, How World! you?"
# Example 5 Input: $str = "I have 2 apples and 3 bananas!"
# Output: "2 3 and apples bananas! have I"
The thoughts
This should be quite simple: split the words, sort them, put them back together. The sort should be case-insensitive.
join " ", sort { lc($a) cmp lc($b) } split(" ", $str);
Creeping doubt #1
Is converting to lowercase with lc the right way to do case-insenstive compares? Not really. Perl has the fc -- fold-case -- function to take care of subtleties in Unicode. We won't see those in simple ASCII text, but for the full rabbit hole, start with the documentation of fc.
Creeping doubt #2
Doing the case conversion inside the sort means that we will invoke that every time there's a string comparison, which will be quite redundant. We could (probably?) speed it up by pre-calculating the conversions once.
The solution
sub sorter($str)
{
return join " ",
map { $_->[0] }
sort { $a->[1] cmp $b->[1] }
map { [$_, fc($_)] }
split(" ", $str);
}
This solution uses the idiom of Schwartzian transformation. Every word gets turned into a pair of [original_word, case_folded_word]. That array of pairs gets sorted, and then we select the original words out of the sorted pairs. This is best read bottom-up.
-
split(" ", $str)-- turns the string into an array of words, where words are loosely defined by being white-space separated. -
map { [$_, fc($_)] }-- every word turns into a pair: the original, and its case-folded variation. The result is a list of array references. -
sort { $a->[1] cmp $b->[1] }-- sort by the case-folded versions. The result is still a list of array references. -
map { $_->[0] }-- select the original word from each pair -
join " "-- Return a single string, where the words are separated by one space.
Does it blend?
A quick benchmark shows that it is indeed faster to pre-calculate the case folding. This example used a text string of about 60 words.
Rate oneline pre_lc
oneline 32258/s -- -45%
pre_lc 58140/s 80% --
My intuition says that when the strings are much shorter, the overhead of the transform might not offset the gains in the sort, but as is so often true, my intuition is crap. This is the result for a string of five words:
oneline 470588/s -- -59%
pre_lc 1142857/s 143% --
Top comments (0)