Ryan Palo

Posted on Oct 28, 2018 • Originally published at assertnotmagic.com

It's Ruby, There Must Be a Better Way

#ruby #exercism #challenge

I was recently doing a challenge on Exercism, on the Ruby track, and I struggled a lot, but when I ended up with a final solution, I was amazed at how much the power of Ruby let me simplify my code.

It's a pretty cool challenge, so, I'll lay out the premise first, and then you can try it out. If you're really into it, the Ruby tests for this exercise are in the Exercism Ruby repo. Most of the repos for the other languages have similar test suites for this exercise as well. This exercise is called Robot Name.

The Premise

You are creating a line of robots that each have a unique name. Their names all follow the pattern letter letter number number number. A couple of valid names might be AB123, YE801, or RP100. This means that there are 26 * 26 * 10 * 10 * 10 possible names. So here's the challenge.

Write a class that creates Robots.
Each Robot created must have a unique name that matches the pattern above.
Robots must be able to be reset, wiping their memory and forgetting their name, receiving a new one that is still unique. Don't worry about recycling their name and returning it to the pool. We'll assume that once a name is taken, it's used up.
The name sequence must be random. That means the sequence must not be predictable.
The Robot Class must have a forget method that makes it forget the current state of robots, resetting any shared state it may have.

Make sense? OK, if you're going to attempt this challenge, off you go. I'm going to share my journey below.

My Journey

1. Naive Guess and Check

The difficulty of this exercise is mainly in the fact that there are so many possibilities for names. Any attempt to build all of those strings and combinations through looping or appending to a list are just waaaay too slow. I tried a bunch of different approaches, and there were actually several false-start versions before I reached version 1. My first thought was to naively keep track of the names used, generate a random name, and check if it's in the list:

class Robot
  LETTERS = ('A'..'Z').to_a
  NUMBERS = ('0'..'9').to_a

  @@taken_names = []

  def self.forget
    @@taken_names = []
  end

  attr_reader :name

  def initialize
      reset
  end

  def reset
    name = ""
    loop do
      name = LETTERS.sample(2) + NUMBERS.sample(3)
      break if ! @@taken_names.include?(name)
    end
    @name = name
  end
end

This works great for the first bunch of robots created, but as soon as you get high enough that there start to be collisions, you quickly realize you've got an algorithm that could, in theory, run infinitely long searching for a needle in a 676,000-strand haystack!

But this is Ruby! There's got to be a better way! Maybe we should do it the other way, starting with a list of possible names and then popping them out, guaranteeing no collisions.

2. Popping Names Off a List

So that's a great thought, but how to build the list of names? Something like this?

@@names = LETTERS
  .product(LETTERS, NUMBERS, NUMBERS, NUMBERS)
  .map(&:join)
# => @@names = ['AA000', 'AA001' ... 'ZZ998', 'ZZ999']

OK, that works. The product method creates a "Cartesian Product" between all of its arguments. For example:

[1, 2, 3].product([4, 5])
# => [
# [1, 4],
# [1, 5],
# [2, 4],
# [2, 5],
# [3, 4],
# [3, 5],
# ]

That giant product above creates a list like this:

[
  ['A', 'A', '0', '0', '0'],
  ['A', 'A', '0', '0', '1'],
  ...

That's why we join them all together into single strings via the .map(&:join) method.

Startup time for our class (as well as forget run time) is pretty long, but maybe that's allowable, seeing how much time it saves us on our algorithm. Right? Right?

Wrong. When our list is huge, randomly choosing an index and then popping that out takes FOR. EVER. Because, each time we pop an item out, all of the items behind that item have to figure out what to do with the gap that it left. This list of names is so huge, it's like trying to turn the Titanic. And how'd that work out for Leo?! (Too soon?)

I even tried generating a giant list of integers instead and converting each integer to my own custom numbering system that was (base 26, base 26, base 10, base 10, base 10), but that was more confusing and not any better.

class Robot
  LETTERS = ('A'..'Z').to_a
  NUMBERS = ('0'..'9').to_a

  @@possible_names = LETTERS
    .product(LETTERS, NUMBERS, NUMBERS, NUMBERS)
    .map(&:join)

  def self.forget
    @@possible_names = LETTERS
      .product(LETTERS, NUMBERS, NUMBERS, NUMBERS)
      .map(&:join)
  end

  def initialize
    reset
  end

  def reset
    next_index = rand(0...@@possible_names.length)
    @name = @@possible_names.pop(next_index)
  end
end

This is Ruby! There must be a better way!

The Final Solution

As it turns out, there is. My fundamental idea of working from a pre-built list was the right idea, but I wasn't letting Ruby work for me enough. There were a lot of things I could improve upon.

First, the building of the list of names. I forgot how awesome ranges in Ruby are.

@@names = 'AA000'..'ZZ999'

That's right. Ruby knows how to increment each character in the string (even the letters) to fill in the gaps. I'll be honest, when this was pointed out to me by the Ruby Track Mentor after all of that work, I only cried for like 12 minutes.

Next, random access. Rather than randomly selecting each time, why not shuffle once, up front? But you can't shuffle a range in Ruby! Not a problem!

@@names = ('AA000'..'ZZ999').to_a.shuffle

This crying session only lasted 7 minutes.

Lastly, dealing with modifying this great giant list over and over. The solution? Don't. The best way to deal with a large dataset isn't to pop off of it. It's to let Ruby work for you and use an Enumerator. This lazily yields each element. It's similar to having a pointer point at the element you want and then moving the pointer to the next element, but way less work.

class Robot
  POSSIBLE_NAMES = 'AA000'..'ZZ999'

  @@names = POSSIBLE_NAMES.to_a.shuffle.each

  def self.forget
    @@names = POSSIBLE_NAMES.to_a.shuffle.each
  end

  attr_reader :name

  def initialize
    reset
  end

  def reset
    @name = @@names.next
  end
end

This way, you walk through the names until you run out. Handily, when the @@names Enumerator runs out of names, a call to @@names.next will raise a StopIteration exception, telling the user that you're out of names. If you want, you could catch that exception and raise your own RobotNamesDepletedError too. And a call to Robot.forget renews the list of names with a new random shuffle.

What did you come up with? Did you try the exercise in another language? How did it turn out? Let me know.

Originally posted on assert_not magic?

Top comments (19)

Eugene Gilburg • Oct 28 '18

Instead of:

  @@names = POSSIBLE_NAMES.to_a.shuffle.each

  def self.forget
    @@names = POSSIBLE_NAMES.to_a.shuffle.each
  end

You can just write:

  def self.forget
    @@names = POSSIBLE_NAMES.to_a.shuffle.each
  end

  forget

Ryan Palo • Oct 28 '18

I wondered about that. That totally makes sense. Thanks!

Tom Lord • Oct 29 '18 • Edited

As yet another alternative approach... Taking either of your basic ideas (either storing all names in an Array, or an Enumerator, or keeping a track of taken_names), you could use the regexp-examples ruby gem that I wrote a few years back to abstract the problem of "getting a valid-formatted robot name" -- for example:

/[A-Z]{2}\d{3}/.random_example

Then, if you have different robots with other naming rules, the rest of the cost would work out of the box. (And as a bonus, this regex will likely already exist in your code, to validate that the robot name is ok!)

Ryan Palo • Oct 29 '18

Hi Tom! Thanks for sharing! How does this Gem handle collisions? One of the test cases generates all 676,000 robot names, so if calling random_example provides duplicates, we run into the same "theoretical infinite time complexity" issue, calling random_example repeatedly until it provides the one robot name we haven't encountered yet.

Tom Lord • Oct 29 '18 • Edited

You could use Regexp#random_example in conjunction with the taken_names, as per your first solution in this blog post.

Or -- with the potential performance issue of storing a large array in ruby memory (as with all your other examples that use to_a!) -- you could also use Regexp#examples to end up with very similar solutions. (See the documentation.) For example:
.

@names = /[A-Z]{2}\d{3}/.examples(max_group_results: 26, max_results_limit: 676000).shuffle.each

...Note that this would be a bad idea if the pattern got longer, so the number of possible results was much larger. All of your examples that use to_a, just like my Regexp#examples code above, would soon freeze the system as the array length grows exponentially.

Using Regexp#random_example - similar to your original implementation - would scale fine though.

Curtis Fenner • Oct 29 '18 • Edited

Because, each time we pop an item out, all of the items behind that item have to figure out what to do with the gap that it left.

You can fix this by swapping the element with the last element, and then popping off the last element, which is always cheap:

@@possible_names[next_index] = @@possible_names[-1]

# Drop off the last value. That's OK, because the last value just replaced
# the value-to-be-used-up.
@@possible_names.pop()

This only works because you don't actually care about the order of values in @@possible_names!

Another much more involved approach would be some kind of sparse Fenwick tree. That way, startup time is instantaneous, and you only use lots of memory when you've roughly used up every-other ID.

Ryan Palo • Oct 29 '18

Woah cool! I’ll look into that. Thanks for the ideas! 😃

Phil Nash • Oct 28 '18

I really enjoyed reading this journey and seeing the trade offs you had to face. Thanks for sharing!

Elizabeth Jenerson • Oct 29 '18

I agree w/ Phil.

Further, I've had the same cry sessions for similar reasons: I worked hard and long to do it one way and here comes a Ruby method or something that not only does it easier but with less lines of code.
🤦🏾‍♀️

Ryan Palo • Oct 29 '18

Glad you liked it!

Matt Jones • Oct 29 '18

There's another, more direct solution to the "running include? on an Array takes longer and longer as the array gets bigger" problem: use a Set.

Add require 'set' to the top of your file; the library ships with Ruby but isn't loaded by default.

Then replace @@taken_names = [] with @@token_names = Set.new.

Under the hood, Set uses a Hash to get constant-time lookups - source. See your favorite Ruby internals book for details on this - I'm a fan of Ruby Under A Microscope.

BTW, I was curious about the pitfalls of representing names as integers and then converting them - so I put together some code. Note: I've been writing a lot of Elixir lately, and the style reflects some of that.

module NameConverter
  DIGITS = ('0'..'9').to_a
  LETTERS = ('A'..'Z').to_a

  AVAILABLE = [LETTERS, LETTERS, DIGITS, DIGITS, DIGITS]
  BACKWARDS = AVAILABLE.reverse

  module_function

  def to_name(x)
    BACKWARDS.reduce(['', x]) do |(acc, n), digits|
      base = digits.length
      idx = n%base
      [acc+digits[idx], n/base]
    end.first.reverse
  end

  def from_name(name)
    AVAILABLE.reduce([name, 0]) do |(name, acc), digits|
      base = digits.length
      char = name[0]
      [name[1..-1], base*acc+digits.index(char)]
    end.last;
  end
end

irb(main):117:0> NameConverter.from_name('AZ700')
=> 25700
irb(main):090:0> NameConverter.to_name(25700)
=> "AZ700"

This would make using the Set approach even more efficient, since Integer objects are 8 bytes instead of 40 bytes for Strings.

tleish • Nov 1 '18 • Edited

Be careful... since self.names is a class method, then this implementation of @names is also a class variable.

So both of these are using class level variables and writing to the global namespace:

class Robot
    def self.names
      @@names = POSSIBLE_NAMES.to_a.shuffle.each
    end
end
Robot.names
Robot.class_variable_get(:@@names)

class Robot
    def self.names
      @names ||= POSSIBLE_NAMES.to_a.shuffle.each
    end
end
Robot.names
Robot.instance_variable_get(:@names)

In this last case, the Robot class is a singleton instance itself.

Class variables are not 'evil' as long as you know what you are doing. Best rule of thumb would be to consider a class variable generally as immutable. In this example, if we had 2 separate processes sharing the same service of make robot names... one could reset the names while the other is running. This would cause a bad side effect. However, if you simple used the class variable to store the initial full set of names and then randomized it with each instance would be safer. Class variable is being used as a constant, immutable value... while random is per instance.

Also, since the built array already exists inside the enumerable, reusing the array is slightly faster than rebuilding from a range each time:

  def self.forget
    @@names = @@names.to_a.shuffle.each
  end

Nick Cinger • Oct 28 '18 • Edited

Not even a Ruby dev, but this was a good read. Interesting challenge and nifty solution :D

Ryan Palo • Oct 28 '18

Thanks! Yeah, when I first read through the challenge, I didn’t think it was going to be that hard. I was wrong.

Ben Halpern • Oct 28 '18

I think Ruby is fundamentally interesting, simple and fun. If it weren't for Rails it would still be known as more of a delightful hobbyist language.

Ryan Palo • Oct 28 '18

Yeah, and a pretty powerful scripting/automation language too, for server-side stuff. A lot more readable and maintainable than Bash, a lot of times.

Federico Ramirez • Oct 28 '18

Pretty cool! I didn't know that about ranges :)

Jordan Running • Nov 2 '18 • Edited

This is a fun challenge. I wanted to see if I could figure out a way to do this without shuffle. The solution I settled on after much googling is to use a linear congruential generator algorithm, which is a psuedorandom number generator with the property that, given the right inputs, its output is a sequence of all integers in a range (0 .. m-1) in pseudorandom order with no repeats (until it exhausts the range, whereupon it begins the sequence again). In other words, it can be used as a "lazy shuffle" algorithm. Here's the implementation I came up with, as a Ruby Enumerator:

The "increment" value C could have been any number coprime with M; I chose 77777 purely for aesthetics.

Note that if we changed M.times do to loop do, the sequence would cycle infinitely.

Once you have this, all that's left is to format each number as a robot name, and then some bookkeeping for the forget! and reset! methods. Here's the final result:

Jordan Running • Nov 2 '18

Just for kicks, here's the same thing in JavaScript: beta.observablehq.com/@jrunning/ro... (Too bad dev.to doesn't embed Observable notebooks.)

View full discussion (19 comments)