Leo Pfeiffer

Posted on Jul 21, 2022

Topological Sort with Kahn's Algorithm

#algorithms #python

On a recent project, we were working on a spreadsheet-like tool where users could cross-reference different rows in formula, just like in Excel. We ran into the question of how to order the evaluation of the rows such that the dependencies do not cause any conflict.

For example, imagine we have three rows:

Row id	Formula	Rendered values
`row1`	`manual input`	1, 2, 3
`row2`	`row3` * 2	6, 8, 6
`row3`	`row1` + 1	2, 3, 4

row1 can be rendered directly, as its formula does not depend on any other rows. row2 however, references row3. row3 references row1. Thus in order to render row2, we first need to render row3.

It turns out, this is quite a common problem, for example in resolving the import order of different modules in a project.

We can solve these types of problems using topological sorting, which takes a DAG (directed acyclical graph) and orders its nodes in a way that for every edge x -> y, x comes before y in the ordering.

Kahn's algorithm is one algorithm that can establish this topological order.

Kahn's Algorithm: Theory

The idea behind Kahn's algorithm is simple:

Put all nodes that have no dependencies in your order
Remove all occurrences of those nodes from the dependencies of all remaining nodes. If there are no cycles, this will give you new nodes without dependencies.
Repeat until there are no more nodes left.

For example:

Node	Depends on
`a`	`b`, `c`
`b`	`d`
`c`	`d`
`d`	-

Here's the dependency graph (where x -> y means that x depends on y.

Thus:

d has no dependencies. Put it in the order.
Order = [d]
Remove occurrences of d from the other dependencies. d occurs in the dependencies of b, c. Thus, after removing d from those dependencies, neither b nor c have any dependencies.

b and c have no dependencies. Put then in the order.
Order = [d, b, c]
Remove occurrences of b and c from the other dependencies. Both b and c occur in the dependencies of a. Thus, after removing b and c from those dependencies, a has no more dependencies.

a has no dependencies. Put it in the order. Order = [d, b, c, a]

No more nodes left. Done. Final order = [d, b, c, a]

Kahn's Algorithm: Implementation

For our implementation we assume as input a dictionary where the keys are the node names and the values are a list of all nodes that the depend on the key.

For example, if nodes B and C depend on A, then on entry in the dictionary would be A: [B, C].

Translating the theory from the previous section into Python code, a possible implementation looks like this:



def topological_sort(nodes: dict[str, list[str]]) -> list[str]:
    """
    Topological sort for a network of nodes

        nodes = {"A": ["B", "C"], "B": [], "C": ["B"]}
        topological_sort(nodes)
        # ["A", "C", "B"]

    :param nodes: Nodes of the network
    :return: nodes in topological order
    """

    # Calculate the indegree for each node
    indegrees = {k: 0 for k in nodes.keys()}
    for name, dependencies in nodes.items():
        for dependency in dependencies:
            indegrees[dependency] += 1

    # Place all elements with indegree 0 in queue
    queue = [k for k in nodes.keys() if indegrees[k] == 0]

    final_order = []

    # Continue until all nodes have been dealt with
    while len(queue) > 0:

        # node of current iteration is the first one from the queue
        curr = queue.pop(0)
        final_order.append(curr)

        # remove the current node from other dependencies
        for dependency in nodes[curr]:
            indegrees[dependency] -= 1

            if indegrees[dependency] == 0:
                queue.append(dependency)

    # check for circular dependencies
    if len(final_order) != len(nodes):
        raise Exception("Circular dependency found.")

    return final_order

Let's try this with a slightly more evolved network:

The corresponding input looks like this:



nodes = {
    "A": ["B", "C"],
    "B": ["C", "D"],
    "C": ["F"],
    "D": ["F"],
    "E": ["F"],
    "F": []
}

If we run this we get the following order:



order = topological_sort(nodes)
print(order)

# ['A', 'E', 'B', 'C', 'D', 'F']

Take another look at the graph and convince yourself that that's indeed a correct order that avoids dependency clashes!

DEV Community

Topological Sort with Kahn's Algorithm

Kahn's Algorithm: Theory

Kahn's Algorithm: Implementation

Top comments (0)

Read next

Enhancing Observability in Machine Learning with OpenTelemetry: InsightfulAI Update

The ultimate guide to Retrieval-Augmented Generation (RAG)

How I Automated My Workflow by Connecting Python to Google Sheets API

LightningChart Python 1.0