Binary Search Tree Explained: All Operations From One Rule

#algorithms #career #leetcode #datastructures

Most BST tutorials teach search, insert, and delete as three procedures. You memorise one, then the next, then the three deletion cases. Then deletion shows up in an interview and the whole stack collapses, because what got memorised was the sequence of steps and not the rule the steps were trying to preserve.

TL;DR: A binary search tree is defined by a single ordering invariant: every node's left subtree contains smaller values and every right subtree contains larger ones. Search, insert, and delete are all derivable from that rule. Treat the invariant as primary and the procedures stop being three things to memorise.

The one rule the whole data structure hangs on

A BST is a binary tree where every node's left subtree contains only smaller values and every right subtree contains only larger values. That is the entire data structure. One ordering invariant, nothing else.

The invariant gives the tree something arrays and linked lists can't give: a way to eliminate half the remaining work at every step. When you're looking for a value, you compare it to the current node and pick a direction. You never look at the other side. The same halving idea that makes binary search fast on sorted arrays, except the tree structure also makes insertion and deletion cheap because nothing has to shift.

A useful framing: every BST operation is the same recursive walk. The walk asks one question at each node (target vs current), takes one of two branches, and stops at some terminal condition. What changes between operations is the terminal condition and what happens when you reach it.

Search and insert: one walk, two endings

Start with a BST holding [8, 3, 10, 1, 6, 14] and search for 6. From the root, 6 < 8, go left to 3. Then 6 > 3, go right and land on 6. Three comparisons, three levels. The right subtree of 8 was never touched.

Now insert 7 into the same tree. From the root, 7 < 8, go left. Then 7 > 3, go right to 6. Then 7 > 6, go right again and land on a null pointer. Create node 7 there. Same walk, different ending. Search stops when the target matches or the pointer is null. Insert stops when the pointer is null and writes a new leaf.

def search(node, target):
    if node is None or node.value == target:
        return node
    if target < node.value:
        return search(node.left, target)
    return search(node.right, target)

def insert(node, value):
    if node is None:
        return Node(value)
    if value < node.value:
        node.left = insert(node.left, value)
    elif value > node.value:
        node.right = insert(node.right, value)
    return node

Read the two functions side by side. The structure is identical. The terminal action differs by a single line. That is the derivation.

Insert always creates a leaf. It never restructures the existing tree, which is what makes it cheap. It is also what eventually makes the tree fragile, since nothing pushes back when an input order produces an unbalanced shape.

Deletion: three cases collapsed to one question

Deletion is where most candidates lose the thread, because removing a node can violate the invariant. The standard explanation gives you three cases and tells you which procedure to run for each. The better framing is one question with three answers.

The question is: how do I fill the gap left by the removed node without breaking left < node < right?

If the node is a leaf, the answer is nothing. Drop the pointer to it. Done.
If the node has one child, the answer is to promote that child. The child's subtree already satisfies the invariant relative to the deleted node's parent, so promotion preserves the rule.
If the node has two children, the answer is to find a replacement that is structurally allowed to sit in that slot. That replacement is the inorder successor (the smallest value in the right subtree), or symmetrically the inorder predecessor.

The two child case is the one that trips people up. You can't promote one child arbitrarily, because the other child's subtree would end up misplaced. The successor works because it is the smallest value larger than the node being deleted, which is what the position requires after the node is gone. Copy its value into the deleted node's slot, then delete the original successor from its own location. The successor has at most one child (it is the leftmost node in a subtree, so its left pointer is null by construction), and removing it falls into case 1 or case 2.

Trace deleting node 3 from [8, 3, 10, 1, 6, 14, 4]:

Step 1: Find node 3. It has two children (1 and 6). Two child case.
Step 2: Find inorder successor of 3.
        Go right to 6, then left to 4.
        Node 4 is the smallest value in 3's right subtree.
Step 3: Copy 4's value into node 3's slot.
Step 4: Delete the original node 4 from its previous position.
        Node 4 was a leaf. Drop the pointer.

Result:
        8
       / \
      4   10
     / \    \
    1   6   14

Verify the invariant holds: 1 < 4 < 6 in the left subtree, 4 < 8 < 10 across the root, 10 < 14 on the right. The deletion preserved the rule.

def delete(node, value):
    if node is None:
        return None
    if value < node.value:
        node.left = delete(node.left, value)
    elif value > node.value:
        node.right = delete(node.right, value)
    else:
        if node.left is None:
            return node.right
        if node.right is None:
            return node.left
        successor = node.right
        while successor.left is not None:
            successor = successor.left
        node.value = successor.value
        node.right = delete(node.right, successor.value)
    return node

The recursive call inside the two child branch is the only piece of the procedure that isn't immediately obvious. It is the same delete walking into the right subtree to remove the successor, which is guaranteed to be a leaf or a single child node and so resolves into one of the first two branches.

Why deletion is where interviews break

Search and insert can be passed by sketching the walk on paper and following the comparisons. Deletion can't, because the two child case requires you to articulate why the inorder successor specifically is the right replacement. That articulation reveals whether you understand the invariant or just memorised three procedures.

I keep seeing the same failure mode. The candidate handles the leaf case and the single child case, then stalls on the two child case. They reach for an arbitrary replacement, realise something is off, and try to patch the resulting violation downstream. The patch usually breaks a different invariant somewhere else, the trace gets longer, and the clock runs out.

The fix isn't to memorise the inorder successor trick harder. The fix is to be able to answer "why this specific replacement?" If you can say "the inorder successor is the smallest value larger than the deleted node, which is what the slot requires after deletion," you'll derive the procedure from the rule in real time. If you can't, you'll keep losing the thread the moment the interviewer adds a constraint.

When the invariant holds but the speed doesn't

Every BST operation runs in O(h) time where h is the tree's height. The marketing claim is O(log n), but that only holds when the tree stays balanced. A plain BST has no mechanism to enforce that.

Insert [1, 2, 3, 4, 5] into an empty BST in that order and you get this:

The invariant is technically satisfied. Every left subtree is empty, every right subtree contains larger values. But the tree is structurally a linked list. Search degrades to O(n). The performance promise comes from balance, and balance comes from input order. Sorted input is the worst possible input for a plain BST.

The fix is a self balancing variant such as AVL or Red-Black trees, which add rotation logic that restructures the tree after each insertion to guarantee O(log n) height regardless of input order. In an interview, if you state "BST operations are O(log n)" without qualifying balance, you'll lose points. The honest answer is "O(log n) on a balanced tree, O(n) worst case, and balance is not automatic on a plain BST."

Where candidates trip up under pressure

These are the patterns I see derail BST questions in interviews, roughly in the order they show up.

Checking only immediate children for the invariant. The rule is not left child < node < right child. It is every node in the left subtree is smaller. A value can pass the immediate-children check and still violate the constraint against a grandparent. The BST validator lesson covers why you have to pass upper and lower bounds down through the recursion to catch this.
Confusing BST ordering with heap ordering. A BST orders left-to-right (left < node < right). A heap orders parent-to-child (parent > children in a max heap). Mixing them up sends you walking the wrong direction during search.
Stalling on two child deletion. Two child deletion is the case interviewers probe deliberately. If you've practiced naming the inorder successor and stating why it works, you've already reduced the case to a previously solved one.
Asserting O(log n) without qualifying balance. The interviewer doesn't want to hear "BST operations are logarithmic." They want to hear "logarithmic when balanced, linear in the degenerate case, and here's what causes the degenerate case." That sentence reveals whether you've thought about the invariant or just memorised the headline.

The BST work was where the "derive from the invariant, don't memorise the procedure" framing first clicked for me when I was building Codeintuition's learning path. Most candidates I see practice the procedures three times and the invariant zero. Flipping that ratio is the part that compounds, because once the invariant is the unit of memory, every operation that respects the invariant becomes derivable on the spot.

I wrote a longer version on my own blog that walks the inorder traversal proof and the recursive construction of a balanced BST from a sorted array.

Which BST operation finally clicked for you only after sketching the tree on paper instead of reading another explanation?