Khushi

Posted on Aug 31

Topological Sort

What is Topological Sorting?

Topological Sorting is a method to organize the vertices of a Directed Acyclic Graph (DAG) in a sequence where, for every directed edge u → v, vertex u precedes vertex v in the sequence.
Topological sorting is the act of sequencing “items with prerequisites”
in an order such that each thing comes only after everything it depends on.

Why only DAGs?
First let's understand ,

1) Why Directed?

Dependencies always exhibit a specific direction: If A → B, it means “A must come before B.” in an undirected graph the relationship can be unclear.
If A and B are connected, does A depend on B or B depend on A? There’s no way to tell. Therefore, directed edges are necessary to clearly indicate which task is dependent on which.

2)Why Acyclic?

A cycle refers to a situation where tasks are mutually dependent on one another in a circular manner.

For instance:
A → B (A precedes B)
B → C (B precedes C)
C → A (C precedes A)
At this point, we're in a stalemate:
A needs to occur before B,
B before C,
C before A.
This presents a paradox → no permissible sequence can be determined.
This is why cycles render topological sorting unachievable.

Real-life example:-

1) Software Development (Compilation Sequence)
During the compilation of extensive software projects, it is crucial for the compiler to adhere to file dependencies.

For instance:

Utility.h needs to be compiled prior to Main.cpp (since Main.cpp includes it).
Math.cpp relies on Math.h (which means Math.h must be compiled first).
Neglecting these dependencies will result in compilation errors.

👉 The compiler uses topological sorting internally to determine the appropriate order for compiling files.

2) Project Management (Task Scheduling)
In project tasks, there are often dependencies, similar to nodes in a graph.

Example Project Flow:
Task A: Collect requirements.
Task B: Design system architecture (depends on A).
Task C: Develop code (depends on B).
Task D: Conduct software testing (depends on C).
Task E: Create documentation (depends on B, but is independent of C/D).
In this case, a suitable sequence could be:
A → B → C → D and B → E (allowing E to proceed in parallel after B).

Approaches for Topological Sorting

There are two primary methods:
1.The In-degree Method (BFS) or Kahn's Algorithm
2.The Depth-First Search (DFS) method

Let’s understand Kahn’s Algorithm in detail in this blog.
Core Concept:
Kahn’s Algorithm is a BFS-based technique that builds a topological order by repeatedly removing nodes with no incoming edges (in-degree = 0).

Intuition:

Nodes with in-degree = 0 represent tasks with no pending prerequisites → they can be executed immediately.
Once such a task is completed, we update the in-degree of its dependent tasks.
If any dependent task’s in-degree becomes 0, it is ready to execute next. This process continues until all nodes are processed.

Steps in Kahn’s Algorithm

Step 1: Compute In-Degree of Each Node

In-degree = number of incoming edges to a node.
For every edge A → B, increment in-degree of B by 1.
Store in-degree values in an array or hash map.

Step 2: Initialize a Queue with All In-degree 0 Nodes

These nodes have no dependencies → they can be done right away.
Use a queue (FIFO) to process them.
If multiple nodes have in-degree 0, any order is valid

Step 3: Process Queue Iteratively

While the queue is not empty:
1.Remove a node u from the queue.
2.Append u to the topological order result list.
3.For each outgoing edge (u → v):

Decrease the in-degree of v by 1 (since u is processed).
If v’s in-degree becomes 0, add it to the queue.

Step 4: Check for Cycles

Following processing, if there are less nodes in the result list than there are nodes in the graph overall, then:
In-degree = 0 was never reached by some nodes.
This indicates that a cycle (circular dependency) exists.Topological sorting is therefore impossible.

Let’s walk through a dry run to better understand how this algorithm works in practice

Let’s say we have 6 tasks (nodes):
0, 1, 2, 3, 4, 5
5 → 2
5 → 0
4 → 0
4 → 1
2 → 3
3 → 1

Step 1: Compute In-degree of Each Node

Count incoming edges for each node:
Node 0: in-degree = 2 (from 5, 4)
Node 1: in-degree = 2 (from 4, 3)
Node 2: in-degree = 1 (from 5)
Node 3: in-degree = 1 (from 2)
Node 4: in-degree = 0
Node 5: in-degree = 0
📌 In-degree array: [2, 2, 1, 1, 0, 0]

Step 2: Initialize Queue with In-degree = 0 nodes

Nodes 4 and 5 have in-degree = 0 → enqueue them.

Step 3: Process Queue Iteratively

Iteration 1
- Dequeue 4 → add to result.
- For edges from 4:
  - 4 → 0: reduce in-degree[0] → 1
  - 4 → 1: reduce in-degree[1] → 1

Iteration 2
- Dequeue 5 → add to result.
- For edges from 5:
- 5 → 2: reduce in-degree[2] → 0 → enqueue 2
- 5 → 0: reduce in-degree[0] → 0 → enqueue 0

Iteration 3
- Dequeue 2 → add to result.
- For edges from 2:
  - 2 → 3: reduce in-degree[3] → 0 → enqueue 3

Iteration 4
- Dequeue 0 → add to result.
- Node 0 has no outgoing edges.

Iteration 5
- Dequeue 3 → add to result.
- For edges from 3:
  - 3 → 1: reduce in-degree[1] → 0 → enqueue 1

Iteration 6
- Dequeue 1 → add to result.
- Node 1 has no outgoing edges.

Step 4:Check for Cycles
All 6 nodes are processed → no cycle.
✅ Valid topological order found.
[4, 5, 2, 0, 3, 1]
Also note that different valid orders are possible depending on queue processing order.

Code snippet

#include <bits/stdc++.h>
using namespace std;

vector<int> topologicalSortKahn(int V, vector<pair<int,int>> &edges) {
    // Step 1: Build graph and compute in-degree
    vector<vector<int>> graph(V);
    vector<int> in_degree(V, 0);

    for (auto &edge : edges) {
        int u = edge.first, v = edge.second;
        graph[u].push_back(v);
        in_degree[v]++;
    }

    // Step 2: Initialize queue with in-degree = 0 nodes
    queue<int> q;
    for (int i = 0; i < V; i++) {
        if (in_degree[i] == 0)
            q.push(i);
    }

    vector<int> topo_order;

    // Step 3: Process queue
    while (!q.empty()) {
        int node = q.front();
        q.pop();
        topo_order.push_back(node);

        // Reduce in-degree of neighbors
        for (int neighbor : graph[node]) {
            in_degree[neighbor]--;
            if (in_degree[neighbor] == 0)
                q.push(neighbor);
        }
    }

    // Step 4: Check for cycle
    if ((int)topo_order.size() != V) {
        cout << "Cycle detected! Topological sort not possible." << endl;
        return {};
    }

    return topo_order;
}

int main() {
    int V = 6;
    vector<pair<int,int>> edges = {
        {5, 2},
        {5, 0},
        {4, 0},
        {4, 1},
        {2, 3},
        {3, 1}
    };

    vector<int> order = topologicalSortKahn(V, edges);

    cout << "Topological Order: ";
    for (int node : order)
        cout << node << " ";
    cout << endl;

    return 0;
}

Let's analyze the time complexity and space complexity
Each node is enqueued/dequeued exactly once.
Each edge is processed exactly once (when decreasing in-degree).
so time complexity is Time Complexity: O(V + E)
We use in-degree array and queue.
so Space Complexity :O(V)

Kahn’s Algorithm (BFS-based) is iterative, so it doesn’t face recursion depth issues that can occur in the DFS approach.

DEV Community

Topological Sort

Top comments (0)