Demystifying Depth-First Search
Vaidehi Joshi Apr 3 '17
Once you’ve learned enough about different data structures, you start to think to yourself: right, so…what’s the point, again? Why do we have all of these structures in the first place?
When you get down into the trees, it can be very easy to lose any sense of the forest. But it’s time for us to zoom out a little bit, because we’re at the point where we can finally — finally! — going to start getting into the super fun stuff. And by super fun, I mean: algorithms!
I started off this series wanting to learn more about all of these algorithms I had always heard about (and occasionally would find myself googling for in the middle of the night before a technical interview, frantically trying to prepare by memorizing terms that the internet told me I ought to know). But, as it turns out, before you can get into the algorithms, you have to know the data structures! And now we do. We talked about the differences between linear and non-linear data structures, and when one type of structure can be more useful than the other. We dove into the differences between graphs and trees, and all of the hidden places that they exist on the internet and inside of our machines.
Now, it’s time for the good stuff: making use of our data structures in order to understand what on earth they’re good for. And there’s no better place to start than the algorithm that was the source of so much confusion for me, for such a long time: depth first search.
A tiny taste of tree traversal
Before we can really get into the intricacies of depth first search, we need to answer one important question first: what does it even mean to traverse a tree? We know a little bit about walking and traversing through graphs, but what about trees?
Well, if your memory is better than mine, you’ll remember that trees are really just limited versions of graphs — which is to say, trees are graphs with a much more strict set of rules to follow. We already know that there are many different ways to walk a graph: we could start at one node, and end at another, or we could start and end in the same place. We could find a simple path that involves us never repeating the same node or edge twice, or we could find a path that allows us to repeat nodes and edges.
Yet, despite their similarities, trees and graphs are definitely different. It’s important for us to understand what exactly we’re talking about when we talk about traversing a tree. So let’s see what we’re dealing with here.
Since trees are a type of graph, tree traversal is, logically enough, a type of graph traversal. Tree traversal is also sometimes referred to as tree search. However, the process of traversing through a tree is a little different than the more broad process of traversing through a graph. When we search through a tree, we’re usually doing it to serve the purpose of either checking all the nodes in the tree structure, or updating all the nodes in the structure. Whichever of these two is the case, there’s one important thing to note here: we’re not going to search through the nodes of a tree more than once. If we’re trying to check or update every single node in a tree, we wouldn’t want to repeat ourselves by visiting a node more than once!
Therefore, when we traverse through a tree, what we’re really trying to do is walk through the tree without ever repeating ourselves.
But it’s not just visiting each node only once that counts — order matters, too! It turns out that, when it comes to trees, there are really only two main techniques that we can lean on when it comes to traversing and visiting each node in the tree only once. Ultimately, we have two choices: we can go wide, or we can go deep.
The more common terms to describe these two options are breadth-first search and depth-first search, and they are probably exactly what you’d expect them to be.
In breadth-first search (BFS), we search through all the nodes in the tree by casting a wide net, so to speak. What this means is that we would search through the nodes from one level to the next, and traverse through all the children of a node before moving on to visit the grandchildren nodes (and we’d visit the grandchildren nodes before visiting the great-grandchildren nodes…you get the idea!).
But we won’t talk about breadth-first search just yet. Instead, let’s turn to the second of the two options: depth-first search (DFS).
In depth-first search, once we start down a path, we don’t stop until we get to the end. In other words, we traverse through one branch of a tree until we get to a leaf, and then we work our way back to the trunk of the tree.
In the graph above, we can see that, instead of traversing level by level, we’re instead traversing through the tree by visiting all of the children, grandchildren, great-grandchildren (and so on) until we come to the end of a path. Then — and only then — do we make our way back up the levels and start on a new path. We walk the path and visit all of the nodes in red first, and then move on to visit the nodes in orange next.
This tree was so deep, I almost drowned
Of course, nothing is that simple in the world of computer science. Even though we’ve broken down our tree traversal options into two possible tracks — BFS and DFS — it turns out that we can go even deeper into depth-first search! Who would’ve guessed.
Once we’ve narrowed down our tree traversal approach to use depth-first search, we’re still only halfway there. Even within the realm of DFS, there are a few different options in terms of which depth-first strategy we want to implement in our tree search!
There are a few different way that we could search through the children, grandchildren, and great-grandchildren nodes of a tree. And really, it all comes down to the order in which we decide to do things.
You might remember that, in addition to containing some piece of data, a node in a binary tree can only ever have two references: a reference to the node on its left (which will be smaller in its data), and a reference to the node on its right (which will be larger in its data). We already know that whenever we search through a tree, we’re trying to either check or update all the nodes in the structure.
In both of these cases, we need to do three things:
- Read the data of the node that we’re checking or updating.
- Check the node to the left of the node (the left reference) that we’re on currently.
- Check the node to the right of the node (the left reference) that we’re on currently.
The different depth-first strategies all revolve around the order in which we do these three things.
Since there are three things we have to do each time we visit/check a node, there are six possible permutations for the order in which we can do these things, which I’ve drawn out in the image to the left.
However, of these six permutations, the first three are the most popular — and also the most common. They’re so ubiquitous, in fact, that they have special names!
The first of these common DFS strategies goes something like this: a) read the data of the node that we’re on, b) visit node that is referenced to the left, if it exists, and c) visit node that is referenced to the right, if it exists. The process of reading data, and then visiting the left node followed by the right node is often written in short form as DLR, where D stands for data, L stands for left node, and R stands for right node.
We use this shorthand in order to describe the order in which we’ll do our checking. So, I told you that these three strategies had special names, right? I guess I should probably tell you what they are:
- Preorder (DLR): Read the data of the node, then visit the left subtree/nodes, followed by the right subtree/nodes.
- Inorder (LDR): Visit the left subtree/nodes, then read the data of the node, and finally visit the right subtree/nodes.
- Postorder (LRD): Visit the left subtree/nodes, then visit the left subtree/nodes, and finally read the data of the node.
Okay. All of these definitions might seem like an awful lot of information to take in at once. I think it’ll be a lot easier — and hopefully, a bit clearer — with a drawing! Let’s take a closer look at what preorder, inorder, and postorder traversal look like using an example tree.
In the image below, we’re trying out all three of these techniques on a binary tree that has 12 nodes in total. This is what each of these traversals would look like if we were printing out the name of each node as we visited it:
Interesting! If we look at how these three traversals work, we’ll notice pretty quickly that the whole “DLR” short form actually carries significant weight.
In preorder traversal, for example, we’re reading the data at the node first, then moving on to the left subtree, and then to the right subtree. As such, the nodes that we visit (and as we print out their data), follow that pattern: first we print out the root node’s data, then the data from the left subtree, and then the data from the right subtree.
However, in inorder traversal, we’re following the path all the way down to the leftmost leaf, and then making our way back to the root node, before following the path down to the rightmost leaf. Inorder traversal is particularly cool because we end up with a sorted list of nodes!
Finally, in postorder traversal, we visit the left node reference first, then the right node, and then if none exist, we read the data of the node that we’re currently on. This is why we read the data of node a, followed by node c, before reading the data of node_b_. We end up reading the root node at the very end of the traversal (after visiting all the nodes in the left subtree and the right subtree), which matches the shorthand for postorder traversal: LRD.
Going (even) deeper with recursion!
Okay, so we have three different methods of implementing depth-first search.
That’s cool, I guess.
But…how do we actually go about implementing any of these strategies? Why, by using recursion, of course!
If you’re totally new to recursion, I highly recommend reading one of my old posts on recursion. Just in case you just need a quick refresher: recursion is the process of calling a method from within that very same method — and effectively repeating an action again and again.
You might have already seen how the depth-first strategy could be implemented as a recursive method. If you think about it, it starts to become more and more clear: we’re doing the same thing — reading data, checking a left node reference, and checking a right node reference — again and again, until we get through all of the nodes in the tree. Sure, sometimes we’re doing these three actions in a slightly different order, depending upon which strategy we chose — but still, we’re doing the same three things, in the same order, with each node that we visit.
We can implement this recursively by first considering what each of these nodes might look like in the context of our code. Here’s a little cross-section of a binary search tree’s node to help you visualize:
Each node has three parts — data, a left reference, and a right reference. Immediately off the bat, we can already see one thing pretty clearly: we’re going to have to repeat the action of “reading” these three parts of a node for each node in the tree.
Thus, the amount of time it’s going to take us to traverse through a tree using DFS is directly proportional to the number of nodes in the tree. The time complexity of using breadth-first search on a binary tree is O(n), where n is the number of nodes in the tree.
If we have 5 nodes, it’ll take us O(5), and if we have 50 nodes to visit, it’ll take us O(50) in terms of time.
Okay, so how could we implement one of those node “cross-sections” in code? Well, it might be as simple as an object, and could look like this:
node1 = {
data: 1,
left: referenceToLeftNode,
right: referenceToRightNode
};
That’s not too bad! Shall we take it a step further? Let’s write out a function for the preorder traversal search strategy. I’ll pseudocode it in JavaScript, but hopefully it should be easy to translate from one language to another:
function preorderSearch(node) {
// Check that a node exists.
if (node === null) {
return;
}
// Print the data of the node.
console.log(node.data);
// Pass in a reference to the left child node to preorderSearch.
// Then, pass reference to the right child node to preorderSearch.
preorderSearch(node.left);
preorderSearch(node.right);
}
Alright, that wasn’t as bad as I was expecting either! All we did was transform the DLR shorthand for the preorder traversal into code. This function takes in a node, and checks that the node exists. Then, it reads the data of the node, and does a preorder search of the left node reference, followed by a preorder search of the right node reference.
Whoa! Recursion in action. We literally wrote one function, but we’re calling that exact same function from within itself. Is your mind spinning yet?
Okay, okay, stay with me, because this recursion magic actually sheds light on one more important thing: the time complexity of breadth-first search. We know that the amount of time that a BFS takes corresponds directly to how big a tree is — specifically, how many nodes it has, because that’s how many nodes we need to visit, which will directly impact how much time it will take for us to traverse the whole tree!
But what about the space complexity? Well, because DFS is usually implemented recursively, this ends up with us calling one function from within itself, many times. Let’s look back at our cross-section example tree. If we were implementing preorder search, we would traverse from node 1 to 2, from 2 to 4, and from node 4 to 8. Each time that we visited one of these nodes, we would be invoking the preorderSearch
function from within the first function we called when we passed in the root node.
Why is this important? Well, because of the call stack. You might remember from earlier in the series when we learned that stacks operate according to the last-in, first-out principle. This means that only when the last function finishes running and returns can we start popping functions that are currently taking up space from off of the top of the stack.
This means that our call stack will continue to grow until we reach a leaf node.
Once we reach a leaf node, our
preorderSearch
function willreturn
because we will pass it references to aleft
and aright
node that simply won’t exist!
And then each of the “open” functions in our call stack will start to return and close up, until we get back down to the first function we called to start off with. This is important to understand because it exemplifies the space complexity of depth-first search — namely, that the amount of space we need in terms of memory depends upon the height of our tree, or O(h). The height of the tree will tell us how much memory we’ll need in the deepest recursive function call, which will tell us the worst-case scenario for running a depth-first search algorithm.
When we take a step back, this is actually pretty powerful — we can learn so much about the strengths (and weaknesses!) of an algorithm just by looking at a data structure! And since we already know where trees are used — in git bisect
commands, for example, and in implementing complex structures, like mazes — we can understand how easy or hard it would be to search through them using DFS, with one simple glance.
I don’t know about you, but I’d say that we’re well on our way to becoming algorithm wizards!
Resources
Depth first search seems to come up quite often in coding interviews, and it can be hard to wrap your head around it at first. If DFS still feels confusing, or if you simply want to learn more about how it works and the different search strategies, you can get started with the links below.
- Binary Trees, Professor H. Levent Akin
- Traversals, Nathan Landman, Karleigh Moore, Jimin Khim
- BFS vs DFS for Binary Tree, GeeksforGeeks
- Applications of Depth First Search, GeeksforGeeks
- Binary tree traversal: Preorder, Inorder, Postorder, mycodeschool
I think the tree you show has a mistake when you did the in-order depth search the tree should be like this graph.
Or the order wouldn't be A,B,C,D,E,F,G,H,I,J,K,L.
From the way you graphed it...
The in-order result would be A,B,C,D,E,F,I,J,H,G,K,L.
I think the right order would be A,B,C,D,E,F,H,I,J,G,K,L. As H doesn't have any left child, H would be printed first and then comes I and J because the in-order goes Left Data Right. Correct me if I'm wrong.