Binary Search

#computerscience #algorithms #data #python

Binary search is a blazingly-fast search algorithm that has a worst case Big O of Ο(log n), the average case is O(log n) and the best case is O(1). In short it's a search algorithm that is about as fast as you can get.

Binary Search utilises divide and conquer to reduce much of the potential runtime. In fact, it is a lot like merge sort in how it functions but of course one algorithm is used to sort data and this one is used to find data.

For the algorithm to run properly, the data must be pre-sorted and be have elements of the same or similar types held within. So if your collection contains strings and integers, it won't work, you must have elements using a consistent data type withinin your collection.

When the binary search is executed, it will take an input collection and a value to find and if the value exists we will return the index of that value within the collection and if it doesn't we will return -1 out of convention to represent that the value doesn't exist in the collection after all.

Tests

from main import binarySearch;

def test_strings():
  assert binarySearch(["Dave", "James"], "Dave") == 0;
  assert binarySearch(["Dave", "James"], "abc") == -1;

def test_integers():
  assert binarySearch([1, 2, 3,4], 3) == 2;
  assert binarySearch([1, 2, 3, 4], 5) == -1;

def test_floats():
  assert binarySearch([3.14], 3.14) == 0;
  assert binarySearch([3.141], 3.14) == -1;

def test_booleans():
  assert binarySearch([False, True], True) == 1;
  assert binarySearch([True], False) == -1;

def test_lists():
  assert binarySearch([[3.14]], [3.14]) == 0;
  assert binarySearch([[3.141]], [3.14]) == -1;

def test_nested_lists():
  assert binarySearch([[3.14, [1]]], [3.14, [1]]) == 0;
  assert binarySearch([[3.141]], [3.14]) == -1;

def test_sets():
  assert binarySearch([set([1])], set([1])) == 0;
  assert binarySearch([set([1])], set([1,2])) == -1;

def test_tuples():
  assert binarySearch([tuple([0,2]), tuple([1])], tuple([1])) == 1;
  assert binarySearch([tuple([1])], tuple([1, 2])) == -1;

In python, as with many other languages, some elements can't compare with one another in a non-hacky manner, even if they are both the same type. An example of this is that types such as dict cannot run comparisons like {'a': 1} > {'a': 1}.

For binary search we do need to use comparison operators like that though and so I represented only the types that are comparible with the ==, >, >=, < and <= operators that binary search utilises.

Here is a repl to run and play with the tests:

Implementation

There are a few ways to skin this particular cat. For that reason, I will show you two implementations for the Binary Search algorithm. One will be using an imperative approach and another using Recursion.

Both implementations expect uniform data and are typed for this reason. We expect data to contain consistent types in the input collection so that the search is guaranteed to be able to compare the items in the collection against one another. If we didn't do this we may get unexpected errors such as this:

TypeError: '>' not supported between instances of 'list' and 'str'

Just remember to normalise and sort your data before you begin using the implementations that will be outlined below or if you decide to build your own implementation in the future as these traits are a requirement of the algorithm.

The imperative approach

Imperative programming is a programming paradigm which describes how a programme should run. This paradigm uses statements to alter a programmes state and generate values.

from typing import List, TypeVar;
from operator import eq as deep_equals, gt as greater_than, ne as not_equal, le as less_than_or_equal;
from math import floor;

T = TypeVar('T');

def binarySearch(collection: List[T], value: T) -> int:
  start = 0;
  end = len(collection) - 1;
  middle = floor((start + end) / 2);

  while(not_equal(collection[middle], value) and less_than_or_equal(start, end)):
    if greater_than(value, collection[middle]): start = middle + 1;
    else: end = middle - 1;

    middle = floor((start + end) / 2);

  return middle if deep_equals(collection[middle], value) else -1;

We begin by importing some helpers from the python standard library which has a whole swath of helper functions for us to use in our code.

A point worth noting is the use of TypeVar from the typing module to align our input types for the collection and value.

All this means is that when we recieve the collection, it should have items of consistent type T and the value we want to search for should also be of type T.

So if we have the collection of type List[str] for example then value must also be an str but instead of hard-coding this, we let TypeVar do the work for us.

Our binarySearch function will recieve a collection to search within and a value to look for in that collection. If the value is found then we return it's location (index) in the collection.

To find the index we setup the start, end and middle indexes of the collection. While the value is not equal to the value in the middle of the collection and the start index is less than or equal to the end index, we check if the value is bigger than the middle value, if it is, we set the start index to begin on the next iteration from the middle, otherwise we set the end to be the middle.

We do this because if the value is greater than the value in the middle of the collection, the value must be on the right half of the collection and vice versa for the opposite case.

Either way the condition falls we will then reset the middle index to be in between the new start and end indexes. We continue with this process until the condition on the while loop is no longer true.

When the while loop exits, we check if the middle indexes value is the value we are looking for. If it is, we return the middle index and if it isn't then the value wasn't in the input collection and we return -1 to represent this case.

The recursive approach

from typing import List, TypeVar, Union;
from operator import eq as deep_equals, gt as greater_than, ne as not_equal, le as less_than_or_equal, ge as greater_than_or_equal;
from math import floor;

T = TypeVar('T');

def binarySearch(collection: List[T], value: T, start: int = 0, end: Union[int, None] = None) -> int:
  if deep_equals(end, None): 
    end = len(collection) - 1;

  if greater_than_or_equal(end, start): 
    middle = floor((start + end) / 2);

    if deep_equals(collection[middle], value):
      return middle;

    if greater_than(collection[middle], value):
      return binarySearch(collection, value, start, middle - 1);

    return binarySearch(collection, value, middle + 1, end);

  return -1;

Exactly like the imperative approach outlined above, we begin by importing some helpers from the python standard library which has a whole swath of helper functions for us to use in our code.

A point worth noting is the use of TypeVar from the typing module to align our input types for the collection and value.

All this means is that when we recieve the collection, it should have items of consistent type T and the value we want to search for should also be of type T.

So if we have the collection of type List[str] for example then value must also be an str but instead of hard-coding this, we let TypeVar do the work for us.

The recursive version naturally calls itself again and again and requires a couple more parameters to be provided on each call than the imperative approach does. Namely we need to know where to the start index and end index are for the current recursive call of the binarySearch function.

Initially the start will be at 0 which makes sense since we want to begin at the start of the collection and the end initially will be set to None but will be set to the length of the collection mins one once the function first runs.

Note: We set end to initialise itself with a value of None because in python you can't reference other parameters to use as default values of other parameters. To get around this, we set it's type to Union[None, int] which just means this can be of type None or int. Once the function first executes we set the end value to the length of the collection mins one because if a collection contains 5 items the last index would be 4 since lists/arrays are always 0 indexed.

If the end index is greater than or equal to the starting index we calculate where the middle index of the current collection is and consider 3 cases:

If the item at the middle index equals the value we are looking for, return the middle index
If the item at the middle index is greater than the value we are looking for then run a recursive call to binarySearch to check the left half of the collection for the value
If the item at the middle index is less than the value we are looking for then run a recursive call to binarySearch to check the right half of the collection for the value

If none of these recursive calls manage to find the value in the collection then we return -1 by default.

Conclusions

The Binary Search algorithm is always a fantastic choice to find values within sorted and type-consistent data sets.

Personally I try to always keep data within applications as uniform as possible and thus, I find myself using this algorithm relatively often in my day to day work. I highly recommend looking into it further and seeing how it can benefit you and your work too!

I hope you learned something new today and if you have any questions, ideas or comments, feel free to comment them below!