loading...

Searching & Sorting w/ Binary Search Trees

blake profile image Blake Khan Originally published at blog.blakekhan.com on ・5 min read

Introduction

Binary Search Trees are an incredibly useful data structure that holds comparable values in a particular order.

With this particular ordering, it allows for fast lookups and a straightforward way to sort the values in the tree.

Regarding use cases, Binary Search Trees are one of the bases for more abstract data structures such as sets and maps.

Definitions

Before we talk about searching and sorting with Binary Search Trees (BSTs), let's make sure we're on the same page regarding terminology.

A Node holds

  • A value.
  • References to other nodes.

A Tree is a data structure that has

  • One parent node.
  • Zero, one, or multiple child nodes.
  • No duplicate nodes.
  • No node that references previously defined nodes (i.e. no looping).

A tree is implemented as one root node, consisting of children of subtrees that have the properties above.

  • A parent node is considered as the root node of that specific subtree.
  • The root node is the starting node for the entire tree.

When you hear the term "Binary Tree", you might feel a little intimidated! But worry not, it's a straightforward concept to grasp.

A Binary Tree is a tree with

  • At most two child nodes (one left, one right) for any parent node.

Now let's define what this entire post is about - Binary Search Trees.

A Binary Search Tree is a particular type of binary tree where

  • The left child node is less than the parent node.
  • The right child node is greater than the parent node.

Implementation

The following code samples are written in C.

We can represent a Node as a structure containing fields holding a value (in this case an integer), and pointers to the left and right child nodes.

// Node Structure
typedef struct node {
    struct node *left; // Pointer pointing to left child node.
    struct node *right; // Pointer pointing to right child node.
    int val; // The value the node is holding.
} node_t;

Sorting

There are different ways that you can traverse a tree. There is pre-order, in-order, out-order, post-order, etc. As this post is specifically about Binary Search Trees, we're not going to go into too much detail about generic tree traversals. However, in-order traversal plays a significant role in BSTs.

In-order traversal follows the pattern LEFT PARENT RIGHT.
Let's take the following BST as an example:

BST: 5, 3, 7

If you traverse the BST using in-order traversal, you'll start with the left node (3), move back to the parent node (5) and end with the right node (7), resulting in 3, 5, 7.

Take a look at that result, does anything stick out to you? In case you didn't notice, it returned a sorted result!

This property of the in-order traversal result is no coincidence.

Try it with any BST; you will always end up with a sorted result. Not convinced? Let's try it one more time with a more complex BST.

BST: 50, 27, 99, 7, 42, 84

Remember that in-order traversal goes like this: Left, Parent, Right.

So starting at 50, go left to 27. 27 has a left child (7), so go there next. 7 doesn't have a left child so that's our first entry. Then move up back to 7's parent (27) which is our second entry, then go to 27's right child (42) which is now our third entry.

So far, our result is:
7, 27, 42

Now, we're done with the subtree where 27 is the root. 50 is the parent of 27, so that's our fourth entry. We then head to 50's right child, 99. However, 99 has a left child (84), so we go there instead. 84 is now our fifth entry. We then go to 84's parent (99) which is our sixth entry. Then we go to 99's right child, which doesn't exist - so we're done!

Our result from the in-order traversal is:
7, 27, 42, 50, 84, 99

Would you look at that, it's completely sorted!

Here's a C implementation of in-order traversal:

void printInOrder(node_t *parent) {
    // Safety Check.
    if (parent == NULL) {
        return;
    }

    // Print Left Subtree.
    printInOrder(parent->left);

    // Print Parent.
    printf("%d, ", parent->val);

    // Print Right Subtree.
    printInOrder(parent->right);
}

In this recursive implementation, we check if the provided Node is NULL or not. This is our base case.

We then recursively call our in-order function on the left child. This will continue calling until there are no more left-children. Then, it will print the current (left) node value. Afterwards, it will again recursively call our in-order function for the right-children.

What we end up with is the values of our provided BST printed out sorted.

Searching

If we want to search for a provided value in a BST, how would we do it?

Searching takes advantage of three features of BSTs:

  • There are no duplicates.
  • Values can be compared to each other.
  • Values to the left are smaller than the parent, and values to the right are greater than the parent.

We start searching at the root node. If the provided value matches the value of the root node, we can simply return because the value has been found.

But if the value isn't a match, we have two options: traverse left or traverse right.

Which should we do?

Well, if the provided value is less than the current node we are looking at, we search left. Otherwise, we search right. This property of having lesser values as the left-child and greater values as the right-child unqiuely defines a Binary Search Tree, so take advantage of it!

In C, this contains function is implemented as so:

bool contains(node_t *parent, const int *value) {
    // Safety Check.
    if (parent == NULL) {
        return false;
    }

    // Found match!
    if (parent->val == *value) {
        return true;
    }

    // Check left subtree.
    if (*value < parent->val) {
        return contains(parent->left, value);
    }

    // Check right subtree.
    return contains(parent->right, value);
}

Not only is this data structure favored for its search efficiency, it's also preferred for its ease of implementation. The search function can be implemented recursively.

As you can see in the implementation, we disregard the right subtree if the value we are looking for is less than that subtree's parent value. Likewise, we disregard the left subtree if the value we are searching for is greater than the subtree's parent value.

Since we either search left or right until the provided value is found, we on average search only half of the nodes in the BST. This makes the average efficiency of searching in a BST O(log n).The worst case is O(n) where all nodes need to be searched (this is where self-balancing trees would be useful).

Full Demo

If you'd like to try out or view the full C implementation demo that includes some testing, click here!

Wrapping things Up

Binary Search Trees are useful data structures when it comes to searching and sorting. By taking advantage of in-order traversal and the properties of this special tree, you can implement both searching and sorting functions not only efficiently, but also easily.

Thanks for reading!

Discussion

pic
Editor guide