Demystifying Depth-First Search
Vaidehi Joshi Apr 3 '17 ă»12 min read
Once youâve learned enough about different data structures, you start to think to yourself: right, soâŠwhatâs the point, again? Why do we have all of these structures in the first place?
When you get down into the trees, it can be very easy to lose any sense of the forest. But itâs time for us to zoom out a little bit, because weâre at the point where we can finallyâââfinally!âââgoing to start getting into the super fun stuff. And by super fun, I mean: algorithms!
I started off this series wanting to learn more about all of these algorithms I had always heard about (and occasionally would find myself googling for in the middle of the night before a technical interview, frantically trying to prepare by memorizing terms that the internet told me I ought to know). But, as it turns out, before you can get into the algorithms, you have to know the data structures! And now we do. We talked about the differences between linear and non-linear data structures, and when one type of structure can be more useful than the other. We dove into the differences between graphs and trees, and all of the hidden places that they exist on the internet and inside of our machines.
Now, itâs time for the good stuff: making use of our data structures in order to understand what on earth theyâre good for. And thereâs no better place to start than the algorithm that was the source of so much confusion for me, for such a long time: depth first search.
A tiny taste of tree traversal
Before we can really get into the intricacies of depth first search, we need to answer one important question first: what does it even mean to traverse a tree? We know a little bit about walking and traversing through graphs, but what about trees?
Well, if your memory is better than mine, youâll remember that trees are really just limited versions of graphsâââwhich is to say, trees are graphs with a much more strict set of rules to follow. We already know that there are many different ways to walk a graph: we could start at one node, and end at another, or we could start and end in the same place. We could find a simple path that involves us never repeating the same node or edge twice, or we could find a path that allows us to repeat nodes and edges.
Yet, despite their similarities, trees and graphs are definitely different. Itâs important for us to understand what exactly weâre talking about when we talk about traversing a tree. So letâs see what weâre dealing with here.
Since trees are a type of graph, tree traversal is, logically enough, a type of graph traversal. Tree traversal is also sometimes referred to as tree search. However, the process of traversing through a tree is a little different than the more broad process of traversing through a graph. When we search through a tree, weâre usually doing it to serve the purpose of either checking all the nodes in the tree structure, or updating all the nodes in the structure. Whichever of these two is the case, thereâs one important thing to note here: weâre not going to search through the nodes of a tree more than once. If weâre trying to check or update every single node in a tree, we wouldnât want to repeat ourselves by visiting a node more than once!
Therefore, when we traverse through a tree, what weâre really trying to do is walk through the tree without ever repeating ourselves.
But itâs not just visiting each node only once that countsâââorder matters, too! It turns out that, when it comes to trees, there are really only two main techniques that we can lean on when it comes to traversing and visiting each node in the tree only once. Ultimately, we have two choices: we can go wide, or we can go deep.
The more common terms to describe these two options are breadth-first search and depth-first search, and they are probably exactly what youâd expect them to be.
In breadth-first search (BFS), we search through all the nodes in the tree by casting a wide net, so to speak. What this means is that we would search through the nodes from one level to the next, and traverse through all the children of a node before moving on to visit the grandchildren nodes (and weâd visit the grandchildren nodes before visiting the great-grandchildren nodesâŠyou get the idea!).
But we wonât talk about breadth-first search just yet. Instead, letâs turn to the second of the two options: depth-first search (DFS).
In depth-first search, once we start down a path, we donât stop until we get to the end. In other words, we traverse through one branch of a tree until we get to a leaf, and then we work our way back to the trunk of the tree.
In the graph above, we can see that, instead of traversing level by level, weâre instead traversing through the tree by visiting all of the children, grandchildren, great-grandchildren (and so on) until we come to the end of a path. Thenâââand only thenâââdo we make our way back up the levels and start on a new path. We walk the path and visit all of the nodes in red first, and then move on to visit the nodes in orange next.
This tree was so deep, I almost drowned
Of course, nothing is that simple in the world of computer science. Even though weâve broken down our tree traversal options into two possible tracksâââBFS and DFSâââit turns out that we can go even deeper into depth-first search! Who wouldâve guessed.
Once weâve narrowed down our tree traversal approach to use depth-first search, weâre still only halfway there. Even within the realm of DFS, there are a few different options in terms of which depth-first strategy we want to implement in our tree search!
There are a few different way that we could search through the children, grandchildren, and great-grandchildren nodes of a tree. And really, it all comes down to the order in which we decide to do things.
You might remember that, in addition to containing some piece of data, a node in a binary tree can only ever have two references: a reference to the node on its left (which will be smaller in its data), and a reference to the node on its right (which will be larger in its data). We already know that whenever we search through a tree, weâre trying to either check or update all the nodes in the structure.
In both of these cases, we need to do three things:
- Read the data of the node that weâre checking or updating.
- Check the node to the left of the node (the left reference) that weâre on currently.
- Check the node to the right of the node (the left reference) that weâre on currently.
The different depth-first strategies all revolve around the order in which we do these three things.
Since there are three things we have to do each time we visit/check a node, there are six possible permutations for the order in which we can do these things, which Iâve drawn out in the image to the left.
However, of these six permutations, the first three are the most popularâââand also the most common. Theyâre so ubiquitous, in fact, that they have special names!
The first of these common DFS strategies goes something like this: a) read the data of the node that weâre on, b) visit node that is referenced to the left, if it exists, and c) visit node that is referenced to the right, if it exists. The process of reading data, and then visiting the left node followed by the right node is often written in short form as DLR, where D stands for data, L stands for left node, and R stands for right node.
We use this shorthand in order to describe the order in which weâll do our checking. So, I told you that these three strategies had special names, right? I guess I should probably tell you what they are:
- Preorder (DLR): Read the data of the node, then visit the left subtree/nodes, followed by the right subtree/nodes.
- Inorder (LDR): Visit the left subtree/nodes, then read the data of the node, and finally visit the right subtree/nodes.
- Postorder (LRD): Visit the left subtree/nodes, then visit the left subtree/nodes, and finally read the data of the node.
Okay. All of these definitions might seem like an awful lot of information to take in at once. I think itâll be a lot easierâââand hopefully, a bit clearerâââwith a drawing! Letâs take a closer look at what preorder, inorder, and postorder traversal look like using an example tree.
In the image below, weâre trying out all three of these techniques on a binary tree that has 12 nodes in total. This is what each of these traversals would look like if we were printing out the name of each node as we visited it:
Interesting! If we look at how these three traversals work, weâll notice pretty quickly that the whole âDLR short form actually carries significant weight.
In preorder traversal, for example, weâre reading the data at the node first, then moving on to the left subtree, and then to the right subtree. As such, the nodes that we visit (and as we print out their data), follow that pattern: first we print out the root nodeâs data, then the data from the left subtree, and then the data from the right subtree.
However, in inorder traversal, weâre following the path all the way down to the leftmost leaf, and then making our way back to the root node, before following the path down to the rightmost leaf. Inorder traversal is particularly cool because we end up with a sorted list of nodes!
Finally, in postorder traversal, we visit the left node reference first, then the right node, and then if none exist, we read the data of the node that weâre currently on. This is why we read the data of node a, followed by node c, before reading the data of node_b_. We end up reading the root node at the very end of the traversal (after visiting all the nodes in the left subtree and the right subtree), which matches the shorthand for postorder traversal: LRD.
Going (even) deeper with recursion!
Okay, so we have three different methods of implementing depth-first search.
Thatâs cool, I guess.
ButâŠhow do we actually go about implementing any of these strategies? Why, by using recursion, of course!
If youâre totally new to recursion, I highly recommend reading one of my old posts on recursion. Just in case you just need a quick refresher: recursion is the process of calling a method from within that very same methodâââand effectively repeating an action again and again.
You might have already seen how the depth-first strategy could be implemented as a recursive method. If you think about it, it starts to become more and more clear: weâre doing the same thingâââreading data, checking a left node reference, and checking a right node referenceâââagain and again, until we get through all of the nodes in the tree. Sure, sometimes weâre doing these three actions in a slightly different order, depending upon which strategy we choseâââbut still, weâre doing the same three things, in the same order, with each node that we visit.
We can implement this recursively by first considering what each of these nodes might look like in the context of our code. Hereâs a little cross-section of a binary search treeâs node to help you visualize:
Each node has three partsâââdata, a left reference, and a right reference. Immediately off the bat, we can already see one thing pretty clearly: weâre going to have to repeat the action of âreading these three parts of a node for each node in the tree.
Thus, the amount of time itâs going to take us to traverse through a tree using DFS is directly proportional to the number of nodes in the tree. The time complexity of using breadth-first search on a binary tree is O(n), where n is the number of nodes in the tree.
If we have 5 nodes, itâll take us O(5), and if we have 50 nodes to visit, itâll take us O(50) in terms of time.
Okay, so how could we implement one of those node âcross-sections in code? Well, it might be as simple as an object, and could look like this:
node1 = {
data: 1,
left: referenceToLeftNode,
right: referenceToRightNode
};
Thatâs not too bad! Shall we take it a step further? Letâs write out a function for the preorder traversal search strategy. Iâll pseudocode it in JavaScript, but hopefully it should be easy to translate from one language to another:
function preorderSearch(node) {
// Check that a node exists.
if (node === null) {
return;
}
// Print the data of the node.
console.log(node.data);
// Pass in a reference to the left child node to preorderSearch.
// Then, pass reference to the right child node to preorderSearch.
preorderSearch(node.left);
preorderSearch(node.right);
}
Alright, that wasnât as bad as I was expecting either! All we did was transform the DLR shorthand for the preorder traversal into code. This function takes in a node, and checks that the node exists. Then, it reads the data of the node, and does a preorder search of the left node reference, followed by a preorder search of the right node reference.
Whoa! Recursion in action. We literally wrote one function, but weâre calling that exact same function from within itself. Is your mind spinning yet?
Okay, okay, stay with me, because this recursion magic actually sheds light on one more important thing: the time complexity of breadth-first search. We know that the amount of time that a BFS takes corresponds directly to how big a tree isâââspecifically, how many nodes it has, because thatâs how many nodes we need to visit, which will directly impact how much time it will take for us to traverse the whole tree!
But what about the space complexity? Well, because DFS is usually implemented recursively, this ends up with us calling one function from within itself, many times. Letâs look back at our cross-section example tree. If we were implementing preorder search, we would traverse from node 1 to 2, from 2 to 4, and from node 4 to 8. Each time that we visited one of these nodes, we would be invoking the preorderSearch
function from within the first function we called when we passed in the root node.
Why is this important? Well, because of the call stack. You might remember from earlier in the series when we learned that stacks operate according to the last-in, first-out principle. This means that only when the last function finishes running and returns can we start popping functions that are currently taking up space from off of the top of the stack.
This means that our call stack will continue to grow until we reach a leaf node.
Once we reach a leaf node, our
preorderSearch
function willreturn
because we will pass it references to aleft
and aright
node that simply wonât exist!
And then each of the âopen functions in our call stack will start to return and close up, until we get back down to the first function we called to start off with. This is important to understand because it exemplifies the space complexity of depth-first searchââânamely, that the amount of space we need in terms of memory depends upon the height of our tree, or O(h). The height of the tree will tell us how much memory weâll need in the deepest recursive function call, which will tell us the worst-case scenario for running a depth-first search algorithm.
When we take a step back, this is actually pretty powerfulâââwe can learn so much about the strengths (and weaknesses!) of an algorithm just by looking at a data structure! And since we already know where trees are usedâââin git bisect
commands, for example, and in implementing complex structures, like mazesâââwe can understand how easy or hard it would be to search through them using DFS, with one simple glance.
I donât know about you, but Iâd say that weâre well on our way to becoming algorithm wizards!
Resources
Depth first search seems to come up quite often in coding interviews, and it can be hard to wrap your head around it at first. If DFS still feels confusing, or if you simply want to learn more about how it works and the different search strategies, you can get started with the links below.
- Binary Trees, Professor H. Levent Akin
- Traversals, Nathan Landman, Karleigh Moore, Jimin Khim
- BFS vs DFS for Binary Tree, GeeksforGeeks
- Applications of Depth First Search, GeeksforGeeks
- Binary tree traversal: Preorder, Inorder, Postorder, mycodeschool
I think the tree you show has a mistake when you did the in-order depth search the tree should be like this graph.
Or the order wouldn't be A,B,C,D,E,F,G,H,I,J,K,L.
From the way you graphed it...
The in-order result would be A,B,C,D,E,F,I,J,H,G,K,L.
I think the right order would be A,B,C,D,E,F,H,I,J,G,K,L. As H doesn't have any left child, H would be printed first and then comes I and J because the in-order goes Left Data Right. Correct me if I'm wrong.