On a recent project, we were working on a spreadsheet-like tool where users could cross-reference different rows in formula, just like in Excel. We ran into the question of how to order the evaluation of the rows such that the dependencies do not cause any conflict.
For example, imagine we have three rows:
Row id | Formula | Rendered values |
---|---|---|
row1 |
manual input |
1, 2, 3 |
row2 |
row3 * 2 |
6, 8, 6 |
row3 |
row1 + 1 |
2, 3, 4 |
row1
can be rendered directly, as its formula does not depend on any other rows. row2
however, references row3
. row3
references row1
. Thus in order to render row2
, we first need to render row3
.
It turns out, this is quite a common problem, for example in resolving the import order of different modules in a project.
We can solve these types of problems using topological sorting, which takes a DAG (directed acyclical graph) and orders its nodes in a way that for every edge x -> y
, x
comes before y
in the ordering.
Kahn's algorithm is one algorithm that can establish this topological order.
Kahn's Algorithm: Theory
The idea behind Kahn's algorithm is simple:
- Put all nodes that have no dependencies in your order
- Remove all occurrences of those nodes from the dependencies of all remaining nodes. If there are no cycles, this will give you new nodes without dependencies.
- Repeat until there are no more nodes left.
For example:
Node | Depends on |
---|---|
a |
b , c
|
b |
d |
c |
d |
d |
- |
Here's the dependency graph (where x
-> y
means that x
depends on y
.
Thus:
d
has no dependencies. Put it in the order.
Order = [d
]Remove occurrences of
d
from the other dependencies.d
occurs in the dependencies ofb
,c
. Thus, after removingd
from those dependencies, neitherb
norc
have any dependencies.
b
andc
have no dependencies. Put then in the order.
Order = [d
,b
,c
]Remove occurrences of
b
andc
from the other dependencies. Bothb
andc
occur in the dependencies ofa
. Thus, after removingb
andc
from those dependencies,a
has no more dependencies.
-
a
has no dependencies. Put it in the order. Order = [d
,b
,c
,a
]
- No more nodes left. Done.
Final order = [
d
,b
,c
,a
]
Kahn's Algorithm: Implementation
For our implementation we assume as input a dictionary where the keys are the node names and the values are a list of all nodes that the depend on the key.
For example, if nodes B
and C
depend on A
, then on entry in the dictionary would be A: [B, C]
.
Translating the theory from the previous section into Python code, a possible implementation looks like this:
def topological_sort(nodes: dict[str, list[str]]) -> list[str]:
"""
Topological sort for a network of nodes
nodes = {"A": ["B", "C"], "B": [], "C": ["B"]}
topological_sort(nodes)
# ["A", "C", "B"]
:param nodes: Nodes of the network
:return: nodes in topological order
"""
# Calculate the indegree for each node
indegrees = {k: 0 for k in nodes.keys()}
for name, dependencies in nodes.items():
for dependency in dependencies:
indegrees[dependency] += 1
# Place all elements with indegree 0 in queue
queue = [k for k in nodes.keys() if indegrees[k] == 0]
final_order = []
# Continue until all nodes have been dealt with
while len(queue) > 0:
# node of current iteration is the first one from the queue
curr = queue.pop(0)
final_order.append(curr)
# remove the current node from other dependencies
for dependency in nodes[curr]:
indegrees[dependency] -= 1
if indegrees[dependency] == 0:
queue.append(dependency)
# check for circular dependencies
if len(final_order) != len(nodes):
raise Exception("Circular dependency found.")
return final_order
Let's try this with a slightly more evolved network:
The corresponding input looks like this:
nodes = {
"A": ["B", "C"],
"B": ["C", "D"],
"C": ["F"],
"D": ["F"],
"E": ["F"],
"F": []
}
If we run this we get the following order:
order = topological_sort(nodes)
print(order)
# ['A', 'E', 'B', 'C', 'D', 'F']
Take another look at the graph and convince yourself that that's indeed a correct order that avoids dependency clashes!
Top comments (0)