Binary Search with Ruby

#algorithms #ruby

When programming we often want to consider the efficiency with which our code will run and we use Big O notation to talk about that efficiency. How efficient a program runs can be the difference between happy users with a good user experience and an absolutely unusable app. The binary search is one of those algorithms that we can use to make a program more efficient.

The binary search is a search algorithm that finds the position of a target within a sorted array with a runtime of O(log(n)) and for this reason is also called a logarithmic search. It accomplishes this by cutting the problem in half every time that it goes through an iteration.

What does this actually mean? Let's imagine you have a phone book with all the names listed alphabetically and you want to see if Liliana is in the book. You open up the phone book exactly in the middle and you check to see if the name you land on is the name you are looking for, or if it comes before or after alphabetically. You land on Mercedes and since it is farther in the alphabet than Liliana, you know you can discard all of the names to the right of Marcos. So now you pick the midpoint between the beginning of the phonebook and Mercedes and land on Fernando. Well Fernando comes before Liliana so you discard all names to the left of Fernando and your next midpoint will be between Fernando and Mercedes. You keep doing this checks and discarding half of your problem until you either land on Liliana or you discover that Liliana is not in the phonebook at all.

This is the essence of the binary search, it compares the target value to the middle element of an array. If they are not equal, the half in which the target cannot lie is eliminated and the search continues on the remaining half, again taking the middle element to compare to the target value, and repeating this until the target value is found. If the search ends with the remaining half being empty, the target is not in the array.

Let's take a look at the advantage of this approach versus traversing the array or phonebook linearly.

Imagine we have an array of 100 names ordered alphabetically. If I use a linear function to check if a name is present, what is the worst case scenario of how many checks I'd have to do?

100 - because worst case it is not present and I looked at every name in the list.

If I use the phone book method and I split the array in half each time, how many checks would I do?

7 - Check the 50th name, then the 25th, 12th, 6th, 3rd, 2nd, then the 1st. That's 7 checks.

So the difference between 100 and 7 is pretty sweet. But 100 checks isn't realistically all that bad, however what happens as we increase the size of the array?

Let's say the array contains 200 people. We'd have to do 200 checks for the linear approach but we'd only have to do 8 for the phonebook check. For 400 people it'd be 400 and 9. Every time we double, the search runtime for linear doubles but increases by one operation for the phone book search. Let's take this to the extreme. Say we have an array of 500 people and we double that 100 times. For the phonebook search we would do 110 checks worst case scenario, but the linear search would require 6.43 * 10^32 checks. (That's 634 Nonillion. I didn't even know that was a number.) More than the number of stars in our galaxy. This is the difference between an algorithm that can be completed relatively quickly and one that is currently impossible for our technology.

Now that we have a basic understanding of what this particular search algorithm can be so important, let's take a look at how we can make one!

You can make a ruby file such as binary_search.rb in your code editor of choice and run the algorithm by running in your terminal the command "ruby binary_search.rb".

arr_size = 1000
r = Random.new

search_arr = Array.new(arr_size) do
  r.rand(arr_size)
end.sort

search_num = r.rand(arr_size)

First we use a random number generator to populate an array of 1000 numbers and we sort it because this is required for the binary search. By making the array this way and using it to generate the number we are searching for, there is good probability that the value is inside the array but there is also an ok chance that it is not. This means by running the scenario multiple times we will see both worst and non-worst case scenarios.

Next we can use the Ruby given index method, which returns the index of the first object in an array where it matches the passed in argument or nil if it does not find one. We won't be doing actual check tests on it, but if we make the arr_size much larger we will be able to see the difference in how long the linear takes to run versus the binary.

puts "Looking for #{search_num}"
puts "Ruby #index O(n)"
resi = search_arr.index(search_num)
puts resi.nil? ? "Could not find #{search_num}" : "Found #{search_arr[resi]} at index #{resi}"

Now let's build an iterative binary search method where by continuously dividing the problem in half we get that O(lg(n)) runtime.

def binary_search_iter(arr, el)
  max = arr.length - 1
  min = 0
end

The method takes in two arguments, the array that we are searching through and the specific number we are searching for.
We set the max and min such that they are reflections of the size of the array, these represent the right and left hand side of the phonebook method for choosing which side to discard.

def binary_search_iter(arr, el)
  max = arr.length - 1
  min = 0

  while min <= max 
       mid = (min + max) / 2

  end
  return nil
end

We can use a while loop that compares the min and the max, we know that if the size difference between the min and the max is 0 or negative, we've gone through the whole array and did not find the element, so it will exit the loop and return nil. We also need to set a variable for the midpoint. Ruby uses integer division and we can use that to our advantage since it will discard any decimal values. We can check our math to make sure that it returns an integer that we are expecting.

Let's say the array length is 3, the midpoint would be at index one. So our min = 0, and max = 2, (0 + 2) / 2 = 1, so the midpoint is at index one which is expected. With even length arrays it's a little trickier because there are two possible midpoints, in our case we don't particularly care which midpoint it grabs, as long as it is one of the two middle elements. If our array is length 3, we would expect index 1 or 2 to be our midpoint. So min = 0, max = 3, (0 + 3) / 2 = 1 because the integer division drops the .5 from 1.5. This gives us one of the expected midpoints.

We can then check if the array at the index of the midpoint has the same value as the number we are searching for and if it does, we want to break the while loop and return true.

def binary_search_iter(arr, el)
  max = arr.length - 1
  min = 0

  while min <= max 
    mid = (min + max) / 2
    if arr[mid] == el
      return mid
    end
  end

  return nil
end

There are two other logical possibilities. The midpoint is greater than or less than the number we are searching for. If the number at the midpoint is greater than the element we are looking for, we discount the right side and focus on left. We do this by resetting the max to be one less than the midpoint. The last possibility is that the mid is less than the element we are searching for, so we need to discard the left side and focus on the elements to the right of the mid. We do this by resetting the min to be one greater than the current midpoint.

def binary_search_iter(arr, el)
  max = arr.length - 1
  min = 0

  while min <= max
    mid = (min + max) / 2
    if arr[mid] == el
      return mid
    elsif arr[mid] > el
      max = mid - 1
    else
      min = mid + 1
    end
  end

  return nil
end

Now we have covered all the scenarios, we return the mid if it's value matches the element, return nil if it is not found, and we have statements that change the conditions of the loop the push it closer to breaking out of the loop. Each iteration lowers the max or raises the min value which prevents us from being stuck in an infinite loop. So eventually we will break the loop and return nil or return the mid value.

We can now use our binary search!

puts "Binary Search Iterative O(lg(n))"
resbi = binary_search_iter(search_arr, search_num)
puts resbi.nil? ? "Could not find #{search_num}" : "Found #{search_arr[resbi]} at index #{resbi}"

Now we can run ruby binary_search.rb and see the results printed in our terminal.

We can now see that both methods are returning the correct number! If you run it enough times odds are you will be told that it could not find the number and that worst case is to be expected occasionally. You may also notice that the linear and binary search methods may not always return the same index where the number was found. This could be because there are duplicates of the number and they found the number in the different indices which is a side effect of using the random number generator. The other reason it might show at different indices is because of the way the two methods search through the array. Linear searches left to right, and binary jumps around as it cuts out half of the array, so you might find the correct number at a different index. This was the case for me in the second run in the screen shot I took up above. This is acceptable because both found a valid index with that number in it and what we care about is the number, not the index.

You may notice that when you run this, it happens pretty quickly. However if you increase the size of the array to 200,000,000 you will notice it takes the computer much longer to run the code with the linear function, but the binary search is still almost instantaneous. This is because of it's O(lg(n)) runtime!

To recap, we use the binary search for sorted datasets (which you can think of as a phone book approach to solving problems) because its O(lg(n)) is significantly faster than a linear search which is O(n). We accomplish this by finding the midpoint, and changing the max and min values based on if the mid value is larger or smaller than the number we are searching for.

For your easier viewing, here is the code in its entirety:

arr_size = 1000
r = Random.new

search_arr = Array.new(arr_size) do
  r.rand(arr_size)
end.sort

search_num = r.rand(arr_size)

def binary_search_iter(arr, el)
  max = arr.length - 1
  min = 0

  while min <= max #know that if the size difference between the min and the max is 0 or negative, we've gone through the whole array and did not find the element
    mid = (min + max) / 2
    if arr[mid] == el
      return mid
    elsif arr[mid] > el #if the mid is greater than element looking for, discount right side and focus on left
      max = mid - 1
    else #covers remaining logic, if mid is less than element looking for, discount left and focus on right
      min = mid + 1
    end
  end

  return nil
end

puts "Looking for #{search_num}"
puts

puts "Ruby #index O(n)"
resi = search_arr.index(search_num)
puts resi.nil? ? "Could not find #{search_num}" : "Found #{search_arr[resi]} at index #{resi}"

puts

puts "Binary Search Iterative O(lg(n))"
resbi = binary_search_iter(search_arr, search_num)
puts resbi.nil? ? "Could not find #{search_num}" : "Found #{search_arr[resbi]} at index #{resbi}"

Happy coding!

Resources:
How to Program Binary Search by Micah Shute

Top comments (2)

Lars Kanis • Jan 5 '23

Thank you for this nice description of binary searches! However Ruby has builtin binary search since many years in form of the Array#bsearch and #bsearch_index methods: ruby-doc.org/core-3.0.0/Array.html...