mattIshida

Posted on Feb 3, 2023

Interspersing arrays in Ruby

Ruby boasts an impressive set of built-in methods for working with arrays. That's great, but it can lead to some deep rabbit-holes when you expect to find a method that solves your problem and one isn't readily available.

In this post, I'm going to describe my journey going down one such rabbit-hole as Ruby beginner.

I'll touch on a number of array methods, do some benchmarking, and finish with an attempt at creating a custom enumerator.

Interspersing arrays

The problem I was looking to solve is easy to state: how do you intersperse or pad a certain separator element between the elements of an array?

With a string, this is easy.

str = 'ruby'
str.split('').join('|')
# => "r|u|b|y"

We've easily inserted a separator character '|' between every pair of consecutive elements. We did so by splitting the original string into an array of individual characters, then chained a call to #join which combines the elements back into a string, inserting its argument along the way.

How do you do the same thing to arrays? (A simple use case is needing to print an array with a separator character between the elements. But it's also an interesting problem in its own right.)

We would like to have a method that operates on an array and returns an a new array with the elements in the same order, just separated by a separator-element which we pass as a parameter.

arr = ['ruby', 'javascript', 'python', 'c']
arr.intersperse([0,1])
#=> ['ruby', [0,1], 'javascript', [0,1], 'python', [0,1], 'c']

I felt sure that there must be a method doing exactly this out of the box.

Generalizing the problem

Let's stipulate that the solution to the problem should be capable of operating on any kind of array, with any kind of separator. The array can be nested to any level, as can the separator. Additionally, the separator may have the same value as one of the elements: we can't rule out that the separator is not also an element.

And even though it's not strictly part of the problem as stated, we should also think about how to generalize the problem to different intervals. What if we wanted the separator every third element, or every fourth, or fifth? What if we want the gaps between separators to increase by one each time like so [,sep,,sep,,,sep,,,,sep....]?

In its full generality, I think of the problem as a kind of nested enumeration. You are enumerating through your original array while also enumerating through another array that determines your separator placement.

But let's get back to the original problem.

Some solutions

Putting first things first, we can start with the simpler problem of interspersing a separator element between consecutive elements of an array.

I dove into the Ruby docs.

Here the wealth of array and enumerable methods was a bit of a red herring. There were so many that it was easy to believe that the perfect method was out there if I just kept digging.

For all I know, it is still out there. But I pretty quickly turned to StackOverflow.

This thread was helpful insofar as it appears to confirm that the one-stop, out-of-the-box method doesn't exist.

It did offer a number of helpful ways to approach the problem. As of this writing, solutions including using #flat_map, #product, #inject, #each, and #zip, with #flat_map getting the most votes.

My own naive solution

Lest you think that I am a slave to StackOverflow, I did try conjuring my own solution out of thin air.

arr, sep = [3,5,7,9], "|"
(0..2*arr.length-2).map {|i| i.even? arr[i/2] : sep}
#=> [3, "|", 5, "|", 7, "|", 9]

The core idea is that our output array has the original elements at even numbered indices and the separator element at odd-numbered indices. The output array has length 2*arr.length-1, hence its last index is 2*arr.length-2. We simply need to iterate over the indices and pull in the appropriate value depending on whether i is even or not.

Sub-array type solutions

Let's dive into each family of solutions in a bit more detail.

#flat_map

arr, sep = [3,5,7,9], "|"
arr.flat_map {|elem| [elem, sep]}[0...-1]
#=> [3, "|", 5, "|", 7, "|", 9]

#product

arr, sep = [3,5,7,9], "|"
arr.product([sep]).flatten(1)[0...-1]
#=> [3, "|", 5, "|", 7, "|", 9]

#zip

arr, sep = [3,5,7,9], "|"
arr.zip([sep]*arr.length).flatten(1)[0...-1]
#=> [3, "|", 5, "|", 7, "|", 9]

Sub-array solutions: takeaways

These three sub-array methods are all doing basically the same thing. In the context of our example arr, they all pass through the intermediate step of creating an array of sub-arrays.

[[3, "|"], [5, "|"], [7, "|"], [9, "|"]]

The final step is to flatten down one level and lop of the final separator. This all happens explicitly in the #product and #zip methods, and under the hood in the #flat_map method.

The #product and #zip methods are almost identical down to the syntax. The principal difference here is that product allows us to pass in a single element array as an argument, while we need to pass in a longer array to #zip. How long exactly? It turns out it doesn't matter, so long as it contains at least as many separators we need in the final array.

For example, passing in a MUCH longer array of separators does nothing to the final result.

arr.zip([sep]*100*arr.length)
#=> [[3, "|"], [5, "|"], [7, "|"], [9, "|"]]

My final takeaway is that while #flat_map performs its flattening under the hood, it doesn't go overboard. It concatenates the results of the block but without calling flatten on those results first. So we can use #flat_map even when our elements and separators are themselves nested.

nested_sep = [[sep]]
nested_arr = arr.map {|elem| [[elem]]}
nested_arr.flat_map {|elem| [elem, nested_sep]}
 => [[[3]], [["*"]], [[5]], [["*"]], [[7]], [["*"]], [[9]], [["*"]]]

Shovel solutions

Moving on to the second family of solutions, which involve iterating once through the array and using the shovel operator.

#inject (#reduce)

arr, sep = [3,5,7,9], "|"
arr.inject([]) {|memo, elem| memo << elem << sep}[0...-1]
#=> [3, "|", 5, "|", 7, "|", 9]

#each

arr, sep = [3,5,7,9], "|"
memo = []
arr.each {|elem| memo << elem << sep}
memo[0...-1]
#=> [3, "|", 5, "|", 7, "|", 9]

Both solutions in this family are doing roughly the same thing: iterating through and shoveling in both the element and the separator. With inject or reduce the accumulator variable memo is created inline, while the each method acts on an existing variable.

The solutions in this family feel a little less clever and seem to garner less support on StackOverflow, but they are brutally efficient, as we'll see.

Benchmarking

So which of these solutions is best? We can use the benchmark gem to do some basic benchmarking.

Somewhat surprisingly, the shovel family of solutions seems to have significant performance benefits at scale.

Note that for benchmarking purposes, I used an array with 10 million elements, each of which was an array of 3 single-digit integers.

Adaptability

Aside from raw performance, the other important question is which of the solutions can be adapted to inserting separators every three, four or five elements?

With the sub-array family, this is pretty straightforward. We simply chain the #each_slice method to the front and add a bit of destructuring in our block.

arr, sep = [3,5,7,9,11,13], "|"
arr.each_slice(2).flat_map {|elem| [*elem, sep]}[0...-1]
#=> [3, 5, "|", 7, 9, "|", 11, 13]

The #each_slice methods creates an array of sub-arrays each with a passed-in length. To avoid an extra layer of nesting in the call to the flat_map block we destructure before adding in the separator element.

For the shovel family, we would need to add some kind of check on the index to determine when we should shovel in the separator. I used the modulo operator %.

arr, sep = [3,5,7,9,11,13], "|"
memo = []
arr.each.with_index do |elem, idx| 
  if idx%2 == 1
    memo << elem << sep
  else 
    memo << elem
  end
end
arr.length%2==0 ? memo[0...-1] : memo
# => [3, 5, "|", 7, 9, "|", 11, 13]

This is a bit more awkward as it involves more calculation and may be more prone to off-by-one errors. I also needed to do an extra check at the end to make sure the return vale does not end in a separator.

Conclusion

While the sub-array solutions are slightly less performant on large arrays, they generalize more easily. It's hard to say that there is a single approach that works best in all situations, for all use cases. But I suppose this is nothing new to most programmers.

Postscript: a custom enumerator

As a postscript, here is my attempt to build a custom enumerator. Why would we want a custom enumerator? I suppose one reason is that we might require an enumerator outside of a context where we're working with arrays. It generalizes the problem to a further level, beyond the array class.

Here then is my amateur-ish attempt.

def nested_enum (arr, sep, every)
    Enumerator.new do |y| 
        main = arr.to_enum
        internal = Enumerator.produce(1) {|x| (x+1)% every}

        a = main.next
        b = internal.next
      loop do
        if b == 0 
            y << sep
        else 
            y << a
            a = main.next
        end
        b = internal.next
      end
    end
end

The method will return an enumerator object. When .next is called on it, it will yield from the main enumerator, except at regular intervals when it will instead yield value sep according to the value of the internal enumerator. It will then go back to yielding from the main without missing a beat.

myEnum = nested_enum((0..8), 'hi', 3).to_a
#=>[0, 1, "hi", 2, 3, "hi", 4, 5, "hi", 6, 7, "hi", 8]

I also considered implementing this using a counter variable instead of an internal enumerator, even though I like the idea of drawing from two enumerators in one.

def nested_enum_counter (arr, sep, every)
    Enumerator.new do |y| 
        main = arr.to_enum
        #internal = Enumerator.produce(1) {|x| (x+1)% every}

        a = main.next
        b = 1
      loop do
        if b % every == 0 
            y << sep
        else 
            y << a
            a = main.next
        end
        b += 1
      end
    end
end

When I benchmarked the custom enumerator solution, I found it was actually significantly less performant than the others discussed above. So I would say that the problem of interspersing arrays in Ruby is still to be completely solved, at least by yours truly.