mattIshida

Posted on Feb 3, 2023

Interspersing arrays in Ruby

#ruby #beginners

Ruby boasts an impressive set of built-in methods for working with arrays. That's great, but it can lead to some deep rabbit-holes when you expect to find a method that solves your problem and one isn't readily available.

In this post, I'm going to describe my journey going down one such rabbit-hole as Ruby beginner.

I'll touch on a number of array methods, do some benchmarking, and finish with an attempt at creating a custom enumerator.

Interspersing arrays

The problem I was looking to solve is easy to state: how do you intersperse or pad a certain separator element between the elements of an array?

With a string, this is easy.



str = 'ruby'
str.split('').join('|')
# => "r|u|b|y"

We've easily inserted a separator character '|' between every pair of consecutive elements. We did so by splitting the original string into an array of individual characters, then chained a call to #join which combines the elements back into a string, inserting its argument along the way.

How do you do the same thing to arrays? (A simple use case is needing to print an array with a separator character between the elements. But it's also an interesting problem in its own right.)

We would like to have a method that operates on an array and returns an a new array with the elements in the same order, just separated by a separator-element which we pass as a parameter.



arr = ['ruby', 'javascript', 'python', 'c']
arr.intersperse([0,1])
#=> ['ruby', [0,1], 'javascript', [0,1], 'python', [0,1], 'c']

I felt sure that there must be a method doing exactly this out of the box.

Generalizing the problem

Let's stipulate that the solution to the problem should be capable of operating on any kind of array, with any kind of separator. The array can be nested to any level, as can the separator. Additionally, the separator may have the same value as one of the elements: we can't rule out that the separator is not also an element.

And even though it's not strictly part of the problem as stated, we should also think about how to generalize the problem to different intervals. What if we wanted the separator every third element, or every fourth, or fifth? What if we want the gaps between separators to increase by one each time like so [,sep,,sep,,,sep,,,,sep....]?

In its full generality, I think of the problem as a kind of nested enumeration. You are enumerating through your original array while also enumerating through another array that determines your separator placement.

But let's get back to the original problem.

Some solutions

Putting first things first, we can start with the simpler problem of interspersing a separator element between consecutive elements of an array.

I dove into the Ruby docs.

Here the wealth of array and enumerable methods was a bit of a red herring. There were so many that it was easy to believe that the perfect method was out there if I just kept digging.

For all I know, it is still out there. But I pretty quickly turned to StackOverflow.

This thread was helpful insofar as it appears to confirm that the one-stop, out-of-the-box method doesn't exist.

It did offer a number of helpful ways to approach the problem. As of this writing, solutions including using #flat_map, #product, #inject, #each, and #zip, with #flat_map getting the most votes.

My own naive solution

Lest you think that I am a slave to StackOverflow, I did try conjuring my own solution out of thin air.



arr, sep = [3,5,7,9], "|"
(0..2*arr.length-2).map {|i| i.even? arr[i/2] : sep}
#=> [3, "|", 5, "|", 7, "|", 9]

The core idea is that our output array has the original elements at even numbered indices and the separator element at odd-numbered indices. The output array has length 2*arr.length-1, hence its last index is 2*arr.length-2. We simply need to iterate over the indices and pull in the appropriate value depending on whether i is even or not.

Sub-array type solutions

Let's dive into each family of solutions in a bit more detail.

#flat_map ```

arr, sep = [3,5,7,9], "|"
arr.flat_map {|elem| [elem, sep]}[0...-1]

=> [3, "|", 5, "|", 7, "|", 9]

- #product

arr, sep = [3,5,7,9], "|"
arr.product([sep]).flatten(1)[0...-1]

=> [3, "|", 5, "|", 7, "|", 9]

- #zip

arr, sep = [3,5,7,9], "|"
arr.zip([sep]*arr.length).flatten(1)[0...-1]

=> [3, "|", 5, "|", 7, "|", 9]

# Sub-array solutions: takeaways
These three sub-array methods are all doing basically the same thing. In the context of our example `arr`, they all pass through the intermediate step of creating an array of sub-arrays.

[[3, "|"], [5, "|"], [7, "|"], [9, "|"]]

The final step is to flatten down one level and lop of the final separator. This all happens explicitly in the `#product` and `#zip` methods, and under the hood in the `#flat_map` method.

The `#product` and `#zip` methods are almost identical down to the syntax. The principal difference here is that product allows us to pass in a single element array as an argument, while we need to pass in a longer array to `#zip`. How long exactly? It turns out it doesn't matter, so long as it contains at least as many separators we need in the final array. 

For example, passing in a MUCH longer array of separators does nothing to the final result.

arr.zip([sep]*100*arr.length)

=> [[3, "|"], [5, "|"], [7, "|"], [9, "|"]]


My final takeaway is that while `#flat_map` performs its flattening under the hood, it doesn't go overboard. It concatenates the results of the block but without calling flatten on those results first. So we can use `#flat_map` even when our elements and separators are themselves nested.

nested_sep = [[sep]]
nested_arr = arr.map {|elem| [[elem]]}
nested_arr.flat_map {|elem| [elem, nested_sep]}
=> [[[3]], [[""]], [[5]], [[""]], [[7]], [[""]], [[9]], [[""]]]


# Shovel solutions

Moving on to the second family of solutions, which involve iterating once through the array and using the shovel operator. 

- #inject (#reduce)

arr, sep = [3,5,7,9], "|"
arr.inject([]) {|memo, elem| memo << elem << sep}[0...-1]

=> [3, "|", 5, "|", 7, "|", 9]

- #each

arr, sep = [3,5,7,9], "|"
memo = []
arr.each {|elem| memo << elem << sep}
memo[0...-1]

=> [3, "|", 5, "|", 7, "|", 9]

Both solutions in this family are doing roughly the same thing: iterating through and shoveling in both the element and the separator. With inject or reduce the accumulator variable memo is created inline, while the each method acts on an existing variable.

The solutions in this family feel a little less clever and seem to garner less support on StackOverflow, but they are brutally efficient, as we'll see.

#Benchmarking

So which of these solutions is best? We can use the `benchmark` gem to do some basic benchmarking. 

Somewhat surprisingly, the shovel family of solutions seems to have significant performance benefits at scale. 

![Image description](https://dev-to-uploads.s3.amazonaws.com/uploads/articles/xj4jvpckopob9a0czm8z.png)

Note that for benchmarking purposes, I used an array with 10 million elements, each of which was an array of 3 single-digit integers.

#Adaptability

Aside from raw performance, the other important question is which of the solutions can be adapted to inserting separators every three, four or five elements?

With the sub-array family, this is pretty straightforward. We simply chain the `#each_slice` method to the front and add a bit of destructuring in our block.

arr, sep = [3,5,7,9,11,13], "|"
arr.each_slice(2).flat_map {|elem| [*elem, sep]}[0...-1]

=> [3, 5, "|", 7, 9, "|", 11, 13]

The `#each_slice` methods creates an array of sub-arrays each with a passed-in length. To avoid an extra layer of nesting in the call to the flat_map block we destructure before adding in the separator element. 

For the shovel family, we would need to add some kind of check on the index to determine when we should shovel in the separator. I used the modulo operator `%`.

arr, sep = [3,5,7,9,11,13], "|"
memo = []
arr.each.with_index do |elem, idx|
if idx%2 == 1
memo << elem << sep
else
memo << elem
end
end
arr.length%2==0 ? memo[0...-1] : memo

=> [3, 5, "|", 7, 9, "|", 11, 13]

This is a bit more awkward as it involves more calculation and may be more prone to off-by-one errors. I also needed to do an extra check at the end to make sure the return vale does not end in a separator.

# Conclusion

While the sub-array solutions are slightly less performant on large arrays, they generalize more easily. It's hard to say that there is a single approach that works best in all situations, for all use cases. But I suppose this is nothing new to most programmers. 

# Postscript: a custom enumerator

As a postscript, here is my attempt to build a custom enumerator. Why would we want a custom enumerator? I suppose one reason is that we might require an enumerator outside of a context where we're working with arrays. It generalizes the problem to a further level, beyond the array class.

Here then is my amateur-ish attempt.

def nested_enum (arr, sep, every)
Enumerator.new do |y|
main = arr.to_enum
internal = Enumerator.produce(1) {|x| (x+1)% every}

    a = main.next
    b = internal.next
  loop do
    if b == 0 
        y << sep
    else 
        y << a
        a = main.next
    end
    b = internal.next
  end
end

end

The method will return an enumerator object. When `.next` is called on it, it will yield from the `main` enumerator, except at regular intervals when it will instead yield value `sep` according to the value of the `internal` enumerator. It will then go back to yielding from the main without missing a beat.

myEnum = nested_enum((0..8), 'hi', 3).to_a

=>[0, 1, "hi", 2, 3, "hi", 4, 5, "hi", 6, 7, "hi", 8]


I also considered implementing this using a counter variable instead of an internal enumerator, even though I like the idea of drawing from two enumerators in one.

def nested_enum_counter (arr, sep, every)
Enumerator.new do |y|
main = arr.to_enum
#internal = Enumerator.produce(1) {|x| (x+1)% every}

    a = main.next
    b = 1
  loop do
    if b % every == 0 
        y << sep
    else 
        y << a
        a = main.next
    end
    b += 1
  end
end

end

When I benchmarked the custom enumerator solution, I found it was actually significantly less performant than the others discussed above. So I would say that the problem of interspersing arrays in Ruby is still to be completely solved, at least by yours truly.

Amplify your impact where it matters most — building exceptional apps.

Leave the infrastructure headaches to us, while you focus on pushing boundaries, realizing your vision, and making a lasting impression on your users.

Get Started