Michael Lip

Posted on Mar 20 • Originally published at zovo.one

Binary Search Is Simple Until You Try to Implement It Correctly

#algorithms #programming #computerscience #beginners

There is a famous statistic from Jon Bentley's "Programming Pearls": when professional programmers were given the specification for binary search and asked to implement it, only about 10% got it right. Binary search has been known since 1946, and the first correct published implementation did not appear until 1962. Even Java's Arrays.binarySearch had an overflow bug that went undetected for nine years.

Binary search is the canonical example of an algorithm that is trivially easy to describe and surprisingly hard to implement correctly.

The Algorithm in Plain English

Given a sorted array and a target value:

Look at the middle element.
If it equals the target, you are done.
If the target is smaller, repeat on the left half.
If the target is larger, repeat on the right half.
If the search space is empty, the target is not in the array.

That is it. A child can understand the concept. The devil is entirely in the details.

The Classic Implementation

def binary_search(arr, target):
    low = 0
    high = len(arr) - 1

    while low <= high:
        mid = low + (high - low) // 2

        if arr[mid] == target:
            return mid
        elif arr[mid] < target:
            low = mid + 1
        else:
            high = mid - 1

    return -1

Every single line of this function contains a decision that can go wrong. Let me walk through the bugs that lurk in each choice.

Bug 1: The Overflow Problem

The natural way to compute the midpoint is (low + high) / 2. This is what the original Java implementation used. The problem: if low + high exceeds the maximum integer value, you get integer overflow. In Java, this wraps to a negative number, which becomes a negative array index, which throws an exception.

The fix is low + (high - low) / 2. This computes the same result without the addition that can overflow. In Python this is not an issue because integers have arbitrary precision, but in C, C++, Java, and most systems languages, this bug will bite you.

Bug 2: The Boundary Conditions

Should high start at len(arr) or len(arr) - 1? Should the loop condition be low < high or low <= high? Should we set high = mid or high = mid - 1?

These choices are interdependent. There are two internally consistent conventions:

Convention A (inclusive high):

high = len(arr) - 1
while low <= high:
    high = mid - 1  (or low = mid + 1)

Convention B (exclusive high):

high = len(arr)
while low < high:
    high = mid  (or low = mid + 1)

Mixing elements from both conventions creates infinite loops or off-by-one errors. Pick one and stick with it. I prefer Convention A because it maps more naturally to how I think about array indices.

Bug 3: The Infinite Loop

If you write high = mid instead of high = mid - 1 in Convention A, and the array has two elements, and the target is smaller than both, the loop never terminates. low stays at 0, high stays at 0, mid stays at 0, forever.

This is the most common binary search bug in the wild. It does not show up in small test cases (single-element arrays work fine) and it does not show up in cases where the target is found. It only appears in specific "not found" scenarios with specific array sizes.

Bug 4: Wrong Comparison Direction

This one is embarrassing but it happens. If you write arr[mid] > target when you mean arr[mid] < target, the algorithm searches the wrong half every time. It will always return "not found" for targets that exist and may infinite-loop or return wrong indices in other cases.

Why O(log n) Matters

Binary search cuts the search space in half with every step. For an array of 1 million elements, binary search needs at most 20 comparisons. For 1 billion elements, it needs at most 30.

Compare this to linear search, which checks every element: 1 million comparisons in the worst case for 1 million elements.

Array Size     Linear Search    Binary Search
100            100              7
10,000         10,000           14
1,000,000      1,000,000        20
1,000,000,000  1,000,000,000    30

This is the power of logarithmic time complexity. It is why binary search shows up everywhere: database indices (B-trees are generalized binary search), version control bisection (git bisect), debugging (bisecting commits to find a regression), and system design (consistent hashing).

Variants Worth Knowing

Lower bound: Find the first element greater than or equal to the target. This is std::lower_bound in C++ and is arguably more useful than standard binary search because it handles duplicates and "find the insertion point" queries.

Upper bound: Find the first element strictly greater than the target. Combined with lower bound, you can find the range of all elements equal to a given value.

Search on answer: Use binary search on a monotonic function rather than an array. "What is the minimum speed that lets me finish the trip in under 5 hours?" If the feasibility function is monotonic (more speed always means less time), binary search finds the answer.

Visualizing Makes It Click

The reason binary search is hard to implement is that the state transitions are hard to keep in your head. You need to track low, high, mid, and the remaining search space across multiple iterations while making sure the boundaries converge correctly.

I built a binary search visualizer at zovo.one that animates each step: the current bounds, the midpoint selection, the comparison result, and the narrowing of the search space. Watching the algorithm run on different inputs -- especially the tricky edge cases like empty arrays, single elements, and targets not in the array -- builds the intuition that makes correct implementation feel natural rather than error-prone.

I am Michael Lip. I build free developer tools at zovo.one. 350+ tools, all private, all free.

DEV Community