Ruby boasts an impressive set of built-in methods for working with arrays. That's great, but it can lead to some deep rabbit-holes when you expect to find a method that solves your problem and one isn't readily available.
In this post, I'm going to describe my journey going down one such rabbit-hole as Ruby beginner.
I'll touch on a number of array methods, do some benchmarking, and finish with an attempt at creating a custom enumerator.
Interspersing arrays
The problem I was looking to solve is easy to state: how do you intersperse or pad a certain separator element between the elements of an array?
With a string, this is easy.
str = 'ruby'
str.split('').join('|')
# => "r|u|b|y"
We've easily inserted a separator character '|' between every pair of consecutive elements. We did so by splitting the original string into an array of individual characters, then chained a call to #join
which combines the elements back into a string, inserting its argument along the way.
How do you do the same thing to arrays? (A simple use case is needing to print an array with a separator character between the elements. But it's also an interesting problem in its own right.)
We would like to have a method that operates on an array and returns an a new array with the elements in the same order, just separated by a separator-element which we pass as a parameter.
arr = ['ruby', 'javascript', 'python', 'c']
arr.intersperse([0,1])
#=> ['ruby', [0,1], 'javascript', [0,1], 'python', [0,1], 'c']
I felt sure that there must be a method doing exactly this out of the box.
Generalizing the problem
Let's stipulate that the solution to the problem should be capable of operating on any kind of array, with any kind of separator. The array can be nested to any level, as can the separator. Additionally, the separator may have the same value as one of the elements: we can't rule out that the separator is not also an element.
And even though it's not strictly part of the problem as stated, we should also think about how to generalize the problem to different intervals. What if we wanted the separator every third element, or every fourth, or fifth? What if we want the gaps between separators to increase by one each time like so [,sep,,sep,,,sep,,,,sep....]
?
In its full generality, I think of the problem as a kind of nested enumeration. You are enumerating through your original array while also enumerating through another array that determines your separator placement.
But let's get back to the original problem.
Some solutions
Putting first things first, we can start with the simpler problem of interspersing a separator element between consecutive elements of an array.
I dove into the Ruby docs.
Here the wealth of array and enumerable methods was a bit of a red herring. There were so many that it was easy to believe that the perfect method was out there if I just kept digging.
For all I know, it is still out there. But I pretty quickly turned to StackOverflow.
This thread was helpful insofar as it appears to confirm that the one-stop, out-of-the-box method doesn't exist.
It did offer a number of helpful ways to approach the problem. As of this writing, solutions including using #flat_map
, #product
, #inject
, #each
, and #zip
, with #flat_map
getting the most votes.
My own naive solution
Lest you think that I am a slave to StackOverflow, I did try conjuring my own solution out of thin air.
arr, sep = [3,5,7,9], "|"
(0..2*arr.length-2).map {|i| i.even? arr[i/2] : sep}
#=> [3, "|", 5, "|", 7, "|", 9]
The core idea is that our output array has the original elements at even numbered indices and the separator element at odd-numbered indices. The output array has length 2*arr.length-1
, hence its last index is 2*arr.length-2
. We simply need to iterate over the indices and pull in the appropriate value depending on whether i
is even or not.
Other one-line solutions
The other solutions on StackOverflow fall basically into two families.
Create
[elem, sep]
sub-arrays, then flatten one level down to remove the nesting that was introduced.Create an array, then iterate through the original array, shoveling in both the
elem
and the separator as you go.
Most of these methods also require a penultimate step of removing the last element since arrays should not end with a separator.
This is achieved using using the ...
operator to construct a range of indices up to but excluding the last index, then using bracket notation to select all elements with index in the range.
Sub-array type solutions
Let's dive into each family of solutions in a bit more detail.
- #flat_map ```
arr, sep = [3,5,7,9], "|"
arr.flat_map {|elem| [elem, sep]}[0...-1]
=> [3, "|", 5, "|", 7, "|", 9]
- #product
arr, sep = [3,5,7,9], "|"
arr.product([sep]).flatten(1)[0...-1]
=> [3, "|", 5, "|", 7, "|", 9]
- #zip
arr, sep = [3,5,7,9], "|"
arr.zip([sep]*arr.length).flatten(1)[0...-1]
=> [3, "|", 5, "|", 7, "|", 9]
# Sub-array solutions: takeaways
These three sub-array methods are all doing basically the same thing. In the context of our example `arr`, they all pass through the intermediate step of creating an array of sub-arrays.
[[3, "|"], [5, "|"], [7, "|"], [9, "|"]]
The final step is to flatten down one level and lop of the final separator. This all happens explicitly in the `#product` and `#zip` methods, and under the hood in the `#flat_map` method.
The `#product` and `#zip` methods are almost identical down to the syntax. The principal difference here is that product allows us to pass in a single element array as an argument, while we need to pass in a longer array to `#zip`. How long exactly? It turns out it doesn't matter, so long as it contains at least as many separators we need in the final array.
For example, passing in a MUCH longer array of separators does nothing to the final result.
arr.zip([sep]*100*arr.length)
=> [[3, "|"], [5, "|"], [7, "|"], [9, "|"]]
My final takeaway is that while `#flat_map` performs its flattening under the hood, it doesn't go overboard. It concatenates the results of the block but without calling flatten on those results first. So we can use `#flat_map` even when our elements and separators are themselves nested.
nested_sep = [[sep]]
nested_arr = arr.map {|elem| [[elem]]}
nested_arr.flat_map {|elem| [elem, nested_sep]}
=> [[[3]], [[""]], [[5]], [[""]], [[7]], [[""]], [[9]], [[""]]]
# Shovel solutions
Moving on to the second family of solutions, which involve iterating once through the array and using the shovel operator.
- #inject (#reduce)
arr, sep = [3,5,7,9], "|"
arr.inject([]) {|memo, elem| memo << elem << sep}[0...-1]
=> [3, "|", 5, "|", 7, "|", 9]
- #each
arr, sep = [3,5,7,9], "|"
memo = []
arr.each {|elem| memo << elem << sep}
memo[0...-1]
=> [3, "|", 5, "|", 7, "|", 9]
Both solutions in this family are doing roughly the same thing: iterating through and shoveling in both the element and the separator. With inject or reduce the accumulator variable memo is created inline, while the each method acts on an existing variable.
The solutions in this family feel a little less clever and seem to garner less support on StackOverflow, but they are brutally efficient, as we'll see.
#Benchmarking
So which of these solutions is best? We can use the `benchmark` gem to do some basic benchmarking.
Somewhat surprisingly, the shovel family of solutions seems to have significant performance benefits at scale.
![Image description](https://dev-to-uploads.s3.amazonaws.com/uploads/articles/xj4jvpckopob9a0czm8z.png)
Note that for benchmarking purposes, I used an array with 10 million elements, each of which was an array of 3 single-digit integers.
#Adaptability
Aside from raw performance, the other important question is which of the solutions can be adapted to inserting separators every three, four or five elements?
With the sub-array family, this is pretty straightforward. We simply chain the `#each_slice` method to the front and add a bit of destructuring in our block.
arr, sep = [3,5,7,9,11,13], "|"
arr.each_slice(2).flat_map {|elem| [*elem, sep]}[0...-1]
=> [3, 5, "|", 7, 9, "|", 11, 13]
The `#each_slice` methods creates an array of sub-arrays each with a passed-in length. To avoid an extra layer of nesting in the call to the flat_map block we destructure before adding in the separator element.
For the shovel family, we would need to add some kind of check on the index to determine when we should shovel in the separator. I used the modulo operator `%`.
arr, sep = [3,5,7,9,11,13], "|"
memo = []
arr.each.with_index do |elem, idx|
if idx%2 == 1
memo << elem << sep
else
memo << elem
end
end
arr.length%2==0 ? memo[0...-1] : memo
=> [3, 5, "|", 7, 9, "|", 11, 13]
This is a bit more awkward as it involves more calculation and may be more prone to off-by-one errors. I also needed to do an extra check at the end to make sure the return vale does not end in a separator.
# Conclusion
While the sub-array solutions are slightly less performant on large arrays, they generalize more easily. It's hard to say that there is a single approach that works best in all situations, for all use cases. But I suppose this is nothing new to most programmers.
# Postscript: a custom enumerator
As a postscript, here is my attempt to build a custom enumerator. Why would we want a custom enumerator? I suppose one reason is that we might require an enumerator outside of a context where we're working with arrays. It generalizes the problem to a further level, beyond the array class.
Here then is my amateur-ish attempt.
def nested_enum (arr, sep, every)
Enumerator.new do |y|
main = arr.to_enum
internal = Enumerator.produce(1) {|x| (x+1)% every}
a = main.next
b = internal.next
loop do
if b == 0
y << sep
else
y << a
a = main.next
end
b = internal.next
end
end
end
The method will return an enumerator object. When `.next` is called on it, it will yield from the `main` enumerator, except at regular intervals when it will instead yield value `sep` according to the value of the `internal` enumerator. It will then go back to yielding from the main without missing a beat.
myEnum = nested_enum((0..8), 'hi', 3).to_a
=>[0, 1, "hi", 2, 3, "hi", 4, 5, "hi", 6, 7, "hi", 8]
I also considered implementing this using a counter variable instead of an internal enumerator, even though I like the idea of drawing from two enumerators in one.
def nested_enum_counter (arr, sep, every)
Enumerator.new do |y|
main = arr.to_enum
#internal = Enumerator.produce(1) {|x| (x+1)% every}
a = main.next
b = 1
loop do
if b % every == 0
y << sep
else
y << a
a = main.next
end
b += 1
end
end
end
When I benchmarked the custom enumerator solution, I found it was actually significantly less performant than the others discussed above. So I would say that the problem of interspersing arrays in Ruby is still to be completely solved, at least by yours truly.
Top comments (0)