Jeff Kreeftmeijer for AppSignal

Posted on May 28, 2019 • Edited on Jun 7, 2019 • Originally published at blog.appsignal.com

Inside Enumeration in Ruby

#ruby #beginners #programming

Welcome back to another edition of Ruby Magic! A year ago, we learned about Ruby’s Enumerable module, which provides the methods you use when working with enumerable objects like arrays, ranges, and hashes.

Back then, we created a LinkedList class to show how to make an object enumerable by implementing the #each method on it. By including the Enumerable module, we were able to call methods like #count, #map and #select on any linked list without having to implement them ourselves.

We've learned how to use enumerables, but how do they work? Part of the magic in enumerables in Ruby comes from their internal implementation, which is all based on the single #each method, and even allows chaining enumerators.

Today, we’ll learn how the methods in the Enumerable class are implemented and how Enumerator objects allow chaining enumeration methods.

As you've become accustomed to, we'll dive in deep by implementing our own versions of the Enumerable module and Enumerator class. So, put on your over-engineering helmet and let's go!

Linked Lists

Before we begin, let’s start with a new version of the linked list class we wrote previously.

class LinkedList
  def initialize(head = nil, *rest)
    @head = head

    if rest.first.is_a?(LinkedList)
      @tail = rest.first
    elsif rest.any?
      @tail = LinkedList.new(*rest)
    end
  end

  def <<(head)
    @head ? LinkedList.new(head, self) : LinkedList.new(head)
  end

  def inspect
    [@head, @tail].compact
  end

  def each(&block)
    yield @head if @head
    @tail.each(&block) if @tail
  end
end

Unlike the previous version, this implementation allows empty lists to be created, as well as lists with more than two items. This version also allows passing a linked list as the tail when initializing another.

irb> LinkedList.new
=> []
irb> LinkedList.new(1)
=> [1]
irb> LinkedList.new(1, 2)
=> [1,[2]]
irb> LinkedList.new(1, 2, 3)
=> [1,[2,[3]]]
irb> LinkedList.new(1, LinkedList.new(2, 3))
=> [1,[2,[3]]]
irb> LinkedList.new(1, 2, LinkedList.new(3))
=> [1,[2,[3]]]

Previously, our LinkedLIst class included the Enumerable module. When mapping over an object using one of Enumerable's methods, the result are stored in an array. This time, we'll implement our own version to make sure our methods return new linked lists instead.

Enumerable Methods

Ruby's Enumerable module comes with enumeration methods like #map, #count, and #select. By implementing the #each method and including the Enumerable module in our class, we'd be able to use those methods directly on our linked lists.

Instead, we'll implement DIYEnumerable and import that instead of Ruby's version. This isn't something you'd typically do, but it will give us a clear insight into how enumeration works internally.

Let's start with #count. Each of the importable methods in the Enumerable class uses the #each method we implemented in our LinkedList class to loop over the object to calculate their results.

module DIYEnumerable
  def count
    result = 0
    each { |element| result += 1 }
    result
  end
end

In this example, we've implemented the #count method on a new DIYEnumerable module that we’ll include in our linked list. It starts a counter at zero and calls the #each method to add one to the counter for every loop. After looping over all elements, the method returns the resulting counter.

module DIYEnumerable
  # ...

  def map
    result = LinkedList.new
    each { |element| result = result << yield(element) }
    result
  end
end

The #map method is implemented similarly. Instead of keeping a counter, it uses an accumulator, which starts as an empty list. We'll loop over all elements in the list, and yield the passed block on each element. The result of each yield is appended to the accumulator list.

The method returns the accumulator after looping over all of the elements in the input list.

class LinkedList
  include DIYEnumerable

  #...
end

After including the DIYEnumerable in our LinkedList, we can test our newly added #count and #map methods.

irb> list = LinkedList.new(73, 12, 42)
=> [73, [12, [42]]]
irb> list.count
=> 3
irb> list.map { |element| element * 10 }
=> [420, [120, [730]]]

Both methods work! The #count method correctly counts the items in the list, and the #map method runs a block for each item and returns an updated list.

Reversed lists

However, the #map method seems to have reverted the list. That’s understandable, as the #<< method on our linked list class prepends items to the list instead of appending them, which is a feature of the recursive nature of linked lists.

For situations where it’s essential that the order of the list is retained, we need a way to reverse the list when mapping over it. Ruby implements Enumerable#reverse_each, which loops over an object in reverse. That which sounds like an excellent solution to our problem. Sadly, we can’t use an approach like that because our list is nested. We don’t know how long the list is until we loop over it entirely.

Instead of running the block over the list in reverse, we’ll add a version of #reverse_each that does this two steps. It first loops over the list to reverse it by creating a new list. After that, it runs the block over the reversed list.

module DIYEnumerable
  # ...

  def reverse_each(&block)
    list = LinkedList.new
    each { |element| list = list << element }
    list.each(&block)
  end

  def map
    result = LinkedList.new
    reverse_each { |element| result = result << yield(element) }
    result
  end
end

Now, we'll use #reverse_each in our #map method, to make sure it's returned in the correct order.

irb> list = LinkedList.new(73, 12, 42)
=> [73, [12, [42]]]
irb> list.map { |element| element * 10 }
=> [730, [120, [420]]]

It works! Whenever we call our #map method on a linked list, we'll get a new list back in the same order as the original.

Chaining Enumeration with Enumerators

Through the #each method implemented on our linked list class and the included DIYEnumerator, we can now loop both ways and map over linked lists.

irb> list.each { |x| p x }
73
12
42
irb> list.reverse_each { |x| p x }
42
12
73
irb> list.reverse_each.map { |x| x * 10 }
=> [420, [120, [730]]]

However, what if we need to map over a list in reverse? Since we now reverse the list before mapping over it, it always returns in the same order as the original list. We've already implemented both #reverse_each and #map, so we should be able to chain them together to be able to map backwards. Luckily, Ruby's Enumerator class can help with that.

Last time, we made sure to call Kernel#to_enum if the LinkedList#each method was called without a block. This allowed for chaining enumerable methods by returning an Enumerator object. To find out how the Enumerator class works, we’ll implement our own version.

class DIYEnumerator
  include DIYEnumerable

  def initialize(object, method)
    @object = object
    @method = method
  end

  def each(&block)
    @object.send(@method, &block)
  end
end

Like Ruby's Enumerator, our enumerator class is a wrapper around a method on an object. By bubbling up to the wrapped object, we can chain enumeration methods.

This works because a DIYEnumerator instance is enumerable itself. It implements #each by calling the wrapped object, and includes the DIYEnumerable module so all enumerable methods can be called on it.

We’ll return an instance of our DIYEnumerator class if no block is passed to the LinkedList#each method.

class LinkedList
  # ...

  def each(&block)
    if block_given?
      yield @head
      @tail.each(&block) if @tail
    else
      DIYEnumerator.new(self, :each)
    end
  end
end

Using our own enumerator, we can now chain enumeration to get the result in the original order without having to pass an empty block to the #reverse_each method call.

irb> list = LinkedList.new(73, 12, 42)
=> [73, [12, [42]]]
irb> list.map { |element| element * 10 }
=> [420, [120, [730]]]

Eager and lazy enumeration

This concludes our peek into the implementation of the Enumerable module and the Enumerator class for now. We've learned how some of the enumerable methods work and how an enumerator helps chaining enumeration by wrapping an enumerable object.

There are some issues with our approach, though. By its nature, enumeration is eager, meaning it loops over the list as soon as one of the enumerable methods is called on it. While that's fine in most cases, mapping over a list in reverse reverses the list twice, which should be unnecessary.

To lower the number of loops, we could employ Enumerator::Lazy to delay looping to the last moment, and have duplicate list reversing cancel itself out.

We'll have to save that for a future episode, though. Don't want to miss that, and further expeditions into Ruby's magical inner workings? Subscribe to the Ruby Magic e-mail newsletter, to get new articles delivered to your inbox as soon as they're published.

DEV Community