Mohamed Abd El-Azeem

Posted on Jun 24, 2017

My First Open-Source Tinkering - On Code Refactoring

#ruby #opensource #refactoring #faker

I've been coding for a couple of years now, but I've rarely touched any open-source code and played with it. Having seen many experienced developers account their successes (partially, of course) to open-source contributions, and believing that diving into that world would ultimately bump up my skills with cool tricks I wouldn't learn otherwise, I decided to give it a shot.

So without further ado, let's get right into this.

Faker Gem

Faker is a handy gem when it comes to testing your code. It generates fake data quickly and easily, and it supports a wide variety of data, such as names, emails, phone numbers and even Game of Thrones characters. Make sure to visit the repo to learn more about it.

Faker Modules

As mentioned above, faker has different Modules for the types of data that you wish to use. You can use something like Faker::Address.street_name, or Faker::Internet.public_ip_v4_address to get a random street name or IP address, respectively. You can find other pretty cool modules on the repo.

The Problem

I've been using the Faker::Time module, which (as you might have guessed), generates random timestamps for you to use in your tests. This module has a between method that takes from and to dates, and generates a date inside the range.

The use case was to generate a time between yesterday and today, and this could be achieved by:

Faker::Time.between(DateTime.now - 1, DateTime.now)

A sample output would be:

=> 2017-06-24 01:00:32 +0200

The problem is that we include DateTime.now as the upper bound, so it could possibly generate a date that corresponds to today. However, it doesn't take into account the hours, minutes or seconds of the day that still haven't come yet. Meaning, if it's 24/6 today and it's 10:00:00 AM, it could still generate a date that is on 24/6 but at 10:30:00 AM (or some time past 10:00:00 AM in general). That was an issue because it led to randomly failing test cases. We particularly didn't want dates in the future to be generated.

The Solution (the fun part, hopefully)

This issue of randomly failing test cases troubled me a lot, so I decided to read the implementation code of the Faker::Time module to see what was causing the problem. And so I did, and I kind of got an intuition about what could be done to work around the problem. So, I forked the repo, cloned it onto my machine, and started playing around.

As the title of the article suggests, I'm not writing this to demonstrate the solution. I'm demonstrating why tinkering with open-source projects is useful. I worked on this to solve a problem, and I ended up learning more about code refactoring in Ruby. And I decided I might make use of this to try to demonstrate the flexibility of this beautiful language.

First, let's look at the source of the problem by inspecting the code:

module Faker
   class Time < Date
     TIME_RANGES = {
       :all => (0..23),
       :day => (9..17),
       :night => (18..23),
       :morning => (6..11),
       :afternoon => (12..17),
       :evening => (17..21),
       :midnight => (0..4)
     }

     class << self
       def between(from, to, period = :all, format = nil)
         time = period == :between ? rand(from..to) : date_with_random_time(super(from, to), period)
         time_with_format(time, format)
       end

       # more code ...

       private

       def date_with_random_time(date, period)
         ::Time.local(date.year, date.month, date.day, hours(period), minutes, seconds)
       end

       # more code ...

       def hours(period)
         raise ArgumentError, 'invalid period' unless TIME_RANGES.has_key? period
         sample(TIME_RANGES[period].to_a)
       end

       def minutes
         seconds
       end

       def seconds
         sample((0..59).to_a)
       end
     end
  end
end

Breakdown

The Faker::Time.between(DateTime.now - 1, DateTime.now) call would cause the between method to call the date_with_random_time method.
This method generates a date whose hours, minutes and seconds are generated from the methods hours, minutes and seconds respectively. And that was precisely the problem.
It is clear that the seconds method (which minutes depends on as well) generates an array of all of the seconds in a day, and takes a sample off of it (picks one at random), regardless of whether it has actually passed or not (its effect is more prominent in minutes than in seconds).
The hours method does essentially the same thing, it only utilizes a hash which takes a period as the key, which is by default :all, and this returns all of the hours of the day.

Having identified where the problem is, I'm going to demonstrate how my code evolved, showing different versions of it one at a time. This will hopefully give you an insight of how I incrementally refactored my code.

Version 1

The idea is simple. During the array generation step and before sampling, I would only let inside the arrays the hours/minutes/seconds that have passed already. Abusing the fact that everything in Ruby is an object, I could call select on the array, and pass it a block which filters the dates that are not in the future.

So the code looks something like this:

def hours(period)
  raise ArgumentError, 'invalid period' unless TIME_RANGES.has_key? period
  sample(TIME_RANGES[period].to_a.select { |t| t <= DateTime.now.hour })
end

def minutes
  sample((0..59).to_a.select { |t| t <= DateTime.now.min })
end

def seconds
  sample((0..59).to_a.select { |t| t <= DateTime.now.sec })
end

It's clear why minutes has its own implementation now. It has to be bound by DateTime.now.min.

Version 2

There are two concerns regarding this first version:

My modification has overridden the default behaviour of the gem. What if a part in our code can tolerate dates that are in the future? Plus, This gem is used across our codebase and I don't want to go around modifying people's work to accommodate the changes I made.
Both minutes and seconds methods look very similar, and it's not preferable to have duplicate code.

So now I have to keep the default behaviour for those who use it, and also make sure that the modification (that only I will use for now) wouldn't require changing people's work.

Tackling the 1st concern

I introduced a new boolean parameter to the between method, namely bound_by_now, which defaults to false. Adding this extra parameter with a default value solves the issue of having other people change their implementation.

Moreover, I would separate the default behaviour from my new implementation using a fancy Ruby ternary operator.

def between(from, to, bound_by_now = false, period = :all, format = nil)
  time = period == :between ? rand(from..to) : date_with_random_time(super(from, to), bound_by_now, period)
  time_with_format(time, format)
end

# more code ...

private

def date_with_random_time(date, bound_by_now, period)
  ::Time.local(date.year, date.month, date.day, hours(bound_by_now, period), minutes(bound_by_now), seconds(bound_by_now))
end

# more code ...

def hours(bound_by_now, period)
  raise ArgumentError, 'invalid period' unless TIME_RANGES.has_key? period
  bound_by_now == true ? sample(TIME_RANGES[period].to_a.select { |t| t <= DateTime.now.hour }) : sample(TIME_RANGES[period].to_a)
end

def minutes(bound_by_now)
  bound_by_now == true ? sample((0..59).to_a.select { |t| t <= DateTime.now.min }) : sample((0..59).to_a)
end

def seconds(bound_by_now)
  bound_by_now == true ? sample((0..59).to_a.select { |t| t <= DateTime.now.sec }) : sample((0..59).to_a)
end

Tackling the 2nd concern

One way to do this is to move the logic to a separate method, namely get_min_sec, and call it with an identifier (for now, a string) to differentiate between minutes and seconds.

def minutes(bound_by_now)
  get_min_sec('min', bound_by_now)
end

def seconds(bound_by_now)
  get_min_sec('sec', bound_by_now)
end

def get_min_sec(type, bound_by_now)
  case type
    when 'min' do limit = DateTime.now.min
    when 'sec' do limit = DateTime.now.sec
  end
  bound_by_now == true ? sample((0..59).to_a.select { |t| t <= limit }) : sample((0..59).to_a)
end

So the final modified version looks something like:

def between(from, to, bound_by_now = false, period = :all, format = nil)
  time = period == :between ? rand(from..to) : date_with_random_time(super(from, to), bound_by_now, period)
  time_with_format(time, format)
end

# more code ...

private

def date_with_random_time(date, bound_by_now, period)
  ::Time.local(date.year, date.month, date.day, hours(bound_by_now, period), minutes(bound_by_now), seconds(bound_by_now))
end

# more code ...

def hours(bound_by_now, period)
  raise ArgumentError, 'invalid period' unless TIME_RANGES.has_key? period
  bound_by_now == true ? sample(TIME_RANGES[period].to_a.select { |t| t <= DateTime.now.hour }) : sample(TIME_RANGES[period].to_a)
end

def minutes(bound_by_now)
  get_min_sec('min', bound_by_now)
end

def seconds(bound_by_now)
  get_min_sec('sec', bound_by_now)
end

def get_min_sec(type, bound_by_now)
  case type
    when 'min' do limit = DateTime.now.min
    when 'sec' do limit = DateTime.now.sec
  end
  bound_by_now == true ? sample((0..59).to_a.select { |t| t <= limit }) : sample((0..59).to_a)
end

Version 3

Obviously this separation has come at the cost of a lengthier code. I had to introduce control statements, such as the ternary operator and the case statement. But again, Ruby is very flexible, and this too can be fixed.

Removing the ternary operator

The block passed to the select method eventually evaluates to a boolean, so why not inject bound_by_now in their as well? Check this code below:

select { some_boolean_expression || !bound_by_now })

If bound_by_now is true, then the block depends on the first expression (as anything ORed with false is itself), which is the one that will select only the dates that have passed.
If 'bound_by_now' is false, then all elements of the array will be selected, since now the the whole block evaluates to true.

So, applying this to hours and get_min_sec, we get:

def hours(bound_by_now, period)
  raise ArgumentError, 'invalid period' unless TIME_RANGES.has_key? period
  sample(TIME_RANGES[period].to_a.select { |t| t <= DateTime.now.hour || !bound_by_now })
end

def minutes(bound_by_now)
  get_min_sec('min', bound_by_now)
end

def seconds(bound_by_now)
  get_min_sec('sec', bound_by_now)
end

def get_min_sec(type, bound_by_now)
  case type
    when 'min' do limit = DateTime.now.min
    when 'sec' do limit = DateTime.now.sec
  end
  sample((0..59).to_a.select { |t| t <= limit || !bound_by_now })

Removing the case statement

As explained in Version 2, minutes and seconds pass identifiers to get_min_sec so that it can decide on the bounding limit. But how can we remove this case statement? We can make use of 2 Ruby tricks:

Ruby supports symbols, which are similar to strings in some sense, except that there can only be one copy of a symbol at a time. This is a good use case of identifiers. A symbol is declared by a colon followed by a name, such as :id.
Objects in Ruby respond to a method called send, which takes as a parameter an identifier of a method, and it simply calls that method on the object (if the object responds to that method, of course). We can easily create :min and :sec symbols here and pass them to send invoked on DateTime.now.

Now we can modify the code to look something like:

def minutes(bound_by_now)
  get_min_sec(:min, bound_by_now)
end

def seconds(bound_by_now)
  get_min_sec(:sec, bound_by_now)
end

def get_min_sec(type, bound_by_now)
  sample((0..59).to_a.select { |t| t <= DateTime.now.send(type) || !bound_by_now })
end

So finally after refactoring both control statements, Version 3 looks as follows:

def hours(bound_by_now, period)
  raise ArgumentError, 'invalid period' unless TIME_RANGES.has_key? period
  sample(TIME_RANGES[period].to_a.select { |t| t <= DateTime.now.hour || !bound_by_now })
end

def minutes(bound_by_now)
  get_min_sec(:min, bound_by_now)
end

def seconds(bound_by_now)
  get_min_sec(:sec, bound_by_now)
end

def get_min_sec(type, bound_by_now)
  sample((0..59).to_a.select { |t| t <= DateTime.now.send(type) || !bound_by_now })
end

Full Code & Testing

Now that the code is complete, let's review it one last time, and also check how we can test it.

Full Code

Here's the full modified code after Version 3:

def between(from, to, bound_by_now = false, period = :all, format = nil)
  time = period == :between ? rand(from..to) : date_with_random_time(super(from, to), bound_by_now, period)
  time_with_format(time, format)
end

  # more code ...

  private

def date_with_random_time(date, bound_by_now, period)
  ::Time.local(date.year, date.month, date.day, hours(bound_by_now, period), minutes(bound_by_now), seconds(bound_by_now))
end

# more code ...

def hours(bound_by_now, period)
  raise ArgumentError, 'invalid period' unless TIME_RANGES.has_key? period
  sample(TIME_RANGES[period].to_a.select { |t| t <= DateTime.now.hour || !bound_by_now })
end

def minutes(bound_by_now)
  get_min_sec(:min, bound_by_now)
end

def seconds(bound_by_now)
  get_min_sec(:sec, bound_by_now)
end

def get_min_sec(type, bound_by_now)
  sample((0..59).to_a.select { |t| t <= DateTime.now.send(type) || !bound_by_now })
end

Testing

Now, in order to test the new behaviour, I could do the following:

Create an empty array.
Push a number of randomly generated dates using Fake::Time.between
Select those who are generated in the future (> DateTime.now) and make sure it prints out an empty array.

This can be translated into code easily:

a = []
for i in 1..500 do
  a << Faker::Time.between(DateTime.now - 1, DateTime.now, true)
end
res = a.select { |t| t > Time.now }

puts res

and it would output:

=> nil

However, this too can be refactored! It can be achieved using a single line:

Array.new(500){ Faker::Time.between(DateTime.now - 1, DateTime.now, true) }.select { |t| t > Time.now  }

and this would output:

=> []

Using the same code, I can omit that third true parameter to activate the default behaviour of the gem. This would possibly print some dates in the future from the array, as it is likely to find some since I'm generating 500 dates.

Final Thoughts

Contributing to open-source projects definitely contributes to your progress as a developer. It pushes you to learn because you actually get to solve real problems. And I really hope I was able to demonstrate that by giving an example.

Finally, I am yet to become a refactoring master. So if anyone has any suggestions, please let me know!

Top comments (3)

cagatay-y • Feb 23 '19

Wouldn’t the code above also filter 13:42 when the time is 14:34 because the minute part is bigger, even though the time is actually earlier because of the hour part?
(It is weird to comment on a 2-year-old post but I am working on my Pocket list :) )