There is a famous statistic from Jon Bentley's "Programming Pearls": when professional programmers were given the specification for binary search and asked to implement it, only about 10% got it right. Binary search has been known since 1946, and the first correct published implementation did not appear until 1962. Even Java's Arrays.binarySearch had an overflow bug that went undetected for nine years.
Binary search is the canonical example of an algorithm that is trivially easy to describe and surprisingly hard to implement correctly.
The Algorithm in Plain English
Given a sorted array and a target value:
- Look at the middle element.
- If it equals the target, you are done.
- If the target is smaller, repeat on the left half.
- If the target is larger, repeat on the right half.
- If the search space is empty, the target is not in the array.
That is it. A child can understand the concept. The devil is entirely in the details.
The Classic Implementation
def binary_search(arr, target):
low = 0
high = len(arr) - 1
while low <= high:
mid = low + (high - low) // 2
if arr[mid] == target:
return mid
elif arr[mid] < target:
low = mid + 1
else:
high = mid - 1
return -1
Every single line of this function contains a decision that can go wrong. Let me walk through the bugs that lurk in each choice.
Bug 1: The Overflow Problem
The natural way to compute the midpoint is (low + high) / 2. This is what the original Java implementation used. The problem: if low + high exceeds the maximum integer value, you get integer overflow. In Java, this wraps to a negative number, which becomes a negative array index, which throws an exception.
The fix is low + (high - low) / 2. This computes the same result without the addition that can overflow. In Python this is not an issue because integers have arbitrary precision, but in C, C++, Java, and most systems languages, this bug will bite you.
Bug 2: The Boundary Conditions
Should high start at len(arr) or len(arr) - 1? Should the loop condition be low < high or low <= high? Should we set high = mid or high = mid - 1?
These choices are interdependent. There are two internally consistent conventions:
Convention A (inclusive high):
high = len(arr) - 1
while low <= high:
high = mid - 1 (or low = mid + 1)
Convention B (exclusive high):
high = len(arr)
while low < high:
high = mid (or low = mid + 1)
Mixing elements from both conventions creates infinite loops or off-by-one errors. Pick one and stick with it. I prefer Convention A because it maps more naturally to how I think about array indices.
Bug 3: The Infinite Loop
If you write high = mid instead of high = mid - 1 in Convention A, and the array has two elements, and the target is smaller than both, the loop never terminates. low stays at 0, high stays at 0, mid stays at 0, forever.
This is the most common binary search bug in the wild. It does not show up in small test cases (single-element arrays work fine) and it does not show up in cases where the target is found. It only appears in specific "not found" scenarios with specific array sizes.
Bug 4: Wrong Comparison Direction
This one is embarrassing but it happens. If you write arr[mid] > target when you mean arr[mid] < target, the algorithm searches the wrong half every time. It will always return "not found" for targets that exist and may infinite-loop or return wrong indices in other cases.
Why O(log n) Matters
Binary search cuts the search space in half with every step. For an array of 1 million elements, binary search needs at most 20 comparisons. For 1 billion elements, it needs at most 30.
Compare this to linear search, which checks every element: 1 million comparisons in the worst case for 1 million elements.
Array Size Linear Search Binary Search
100 100 7
10,000 10,000 14
1,000,000 1,000,000 20
1,000,000,000 1,000,000,000 30
This is the power of logarithmic time complexity. It is why binary search shows up everywhere: database indices (B-trees are generalized binary search), version control bisection (git bisect), debugging (bisecting commits to find a regression), and system design (consistent hashing).
Variants Worth Knowing
Lower bound: Find the first element greater than or equal to the target. This is std::lower_bound in C++ and is arguably more useful than standard binary search because it handles duplicates and "find the insertion point" queries.
Upper bound: Find the first element strictly greater than the target. Combined with lower bound, you can find the range of all elements equal to a given value.
Search on answer: Use binary search on a monotonic function rather than an array. "What is the minimum speed that lets me finish the trip in under 5 hours?" If the feasibility function is monotonic (more speed always means less time), binary search finds the answer.
Visualizing Makes It Click
The reason binary search is hard to implement is that the state transitions are hard to keep in your head. You need to track low, high, mid, and the remaining search space across multiple iterations while making sure the boundaries converge correctly.
I built a binary search visualizer at zovo.one that animates each step: the current bounds, the midpoint selection, the comparison result, and the narrowing of the search space. Watching the algorithm run on different inputs -- especially the tricky edge cases like empty arrays, single elements, and targets not in the array -- builds the intuition that makes correct implementation feel natural rather than error-prone.
I am Michael Lip. I build free developer tools at zovo.one. 350+ tools, all private, all free.
Top comments (0)