Overview of Breadth-First Search and it's Uses

#beginners #computerscience

Breadth-First Search

Introduction

On my path to improving my data structures and algorithms knowledge, I decided to blog about breadth-first search. This algorithm comes up a lot in interviews and it will be important to master.

Breath-first search is an algorithm for searching trees and graphs. Trees and graphs are two different types of data structures. From this article,

"Tree and graph come under the category of non-linear data structure where tree offers a very useful way of representing a relationship between the nodes in a hierarchical structure and graph follows a network model. Tree and graph are differentiated by the fact that a tree structure must be connected and can never have loops while in the graph there are no such restrictions."

The article also has this image that I think clarifies the difference between the two:

As you can see from the diagram above, a tree progresses out from the root node at the top and doesn't loop, where as the graph doesn't necessarily have a set structure and looks more like a web.

Breadth-First Search

Breadth-first search is used to search a tree or graph. The algorithm starts at the tree root or a selected node on the graph. All neighbors of the node at the current depth will be explored before moving onto the next level. For example using the diagram above, if we start at tree root A on level zero, next its children will be explored(B and C). Since all nodes have been explored on level one of the tree, the next level will be explored(D, E, F). Finally, the last level will be explored (G, H).

Implementation

Breadth-first search uses a queue to store visited nodes. A queue is a data structure where elements are removed first-in first-out(FIFO). For example, say we have this queue [], and we add D, E, and F [D, E, F] from the second level. Adding to the queue is known as enqueuing. D will be removed from the queue first since it was added first. This is also known as dequeuing.

Here is some pseudocode from Wikipedia:

NOTE: The words vertex and node are used interchangeably in my step by step walkthrough below.

1 procedure BFS(G, start_v) is
2      let Q be a queue
3      label start_v as discovered
4      Q.enqueue(start_v)
5      while Q is not empty do
6          v := Q.dequeue()
7          if v is the goal then
8              return v
9          for all edges from v to w in G.adjacentEdges(v) do
10             if w is not labeled as discovered then
11                 label w as discovered
12                 w.parent := v
13                 Q.enqueue(w)

The function takes in a graph(G) and a starting vertex or root node(start_v).

First an empty queue is initialized queue = []
The starting node is labeled as discovered since it has already been visited
The starting node is added to the queue queue = [A]
Next we enter a while loop that exits once the queue is empty
In this loop, we dequeue and assign that node to a variable v v = A. Now our queue is empty queue = [].
Next we check if v (node A) is our goal and if it is, return the node.
If it isn't the goal we enter another loop that occurs once for each adjacent edge of the node. An edge is a connection between two nodes. Think of it as the line between one node and the next. In this case, our adjacent edges connect to nodes G.adjacentEdges(v) = [B, C].
Within the loop, each adjacent node is assigned to a variable w. w = B. We check if the node has already been discovered and if it hasn't, we label it as discovered, we set its parent as node v w.parent = A, and we enqueue w queue = [B].
This will be done with all adjacent vertices. In this case, C is the only adjacent node left.
After the loop exits, we will be back at line 6 where we dequeue the next vertex (B) and continue with the cycle until the goal node is found.

Note from the article:

"Nodes can be labelled as discovered by storing them in a set, or by an attribute on each node, depending on the implementation."

Time and Space Complexity

Time Complexity

The time complexity of breadth-first search in a graph is O(|V| + |E|), where V is the total number of nodes and E is the total number of edges of the graph since each node should not be visited more than once. For a tree, the time complexity is O(|V|) where V is the number of nodes.

From this quora answer:

"The edge |E| term should not be forgotten because in some cases (like highly connected graphs) it can be much larger than the number of nodes. On a tree we can just ignore it."

Space Complexity

The space complexity of breadth-first search is O(|V|) since the worst case is holding all of the vertices in the queue at the same time.

Use Cases

Breadth-first search is useful if you want to find the shortest path between one vertex and another in an unweighted graph. An unweighted graph is a graph where the edges have weights attached to them.

From this blog:

"A weight is a numerical value attached to each individual edge. In a weighted graph relationships between nodes have a magnitude and this magnitude is important to the relationship we’re studying. In an unweighted graph the existence of a relationship is the subject of our interest."

BFS can also be used to check if there is a path between two vertices, find the length of the path, or find all of the nodes reachable from a given node.

In Summary

Breadth-first search is an algorithm that every programmer learns at some point in their journey. Let me know about your experiences using it. Thank you for reading and happy searching!