DEV Community

SEN LLC
SEN LLC

Posted on

A Topological-Sort CLI in Python — Kahn's Algorithm and a DFS That Names the Cycle

make targets, npm dependencies, task ordering — "do B and C before A" is a dependency graph, and putting it in a valid order is topological sorting. I built a CLI for it. Two implementation hinges: (1) Kahn's algorithm (in-degrees + a queue) with stable, lexicographic output, and (2) when there's a cycle, don't just say "cycle detected" — use DFS to name exactly which nodes form it, as auth -> cache -> db -> auth. The last two entries were Rust and Go, so this one's a Python CLI.

📦 GitHub: https://github.com/sen-ltd/topo-sort-cli

(It's a CLI — no live demo. Run with python3 main.py or Docker.)

Screenshot

Input format

One declaration per line. A: B C means A depends on B and C (B and C come first). # comments and blanks are ignored:

deploy: build test
build: compile
test: compile
compile:
Enter fullscreen mode Exit fullscreen mode

Internally each dependency is stored as a "prerequisite → dependent" edge (A: B → edge B -> A), so a prerequisites-first topological sort is directly a build order.

Hinge 1: Kahn's algorithm with stable output

Topological sort comes in a DFS flavor and a Kahn flavor. Kahn's is intuitive: repeatedly take nodes whose in-degree (number of prerequisites) is zero:

  1. Count every node's in-degree.
  2. Queue the zero-in-degree nodes (no prerequisites → can go first).
  3. Pop one, emit it, decrement its dependents' in-degrees.
  4. Newly-zero nodes join the queue. Repeat.

A naive queue pops in arbitrary order, so the output wobbles run to run. A min-heap of ready nodes always takes the lexicographically smallest, making output deterministic:

import heapq

def topo_sort(g):
    indeg = g.indegrees()
    ready = [n for n, d in indeg.items() if d == 0]
    heapq.heapify(ready)                       # ready set as a min-heap

    order = []
    while ready:
        n = heapq.heappop(ready)               # smallest ready node → stable
        order.append(n)
        for v in sorted(g.adj[n]):
            indeg[v] -= 1
            if indeg[v] == 0:
                heapq.heappush(ready, v)

    if len(order) == len(g.nodes):
        return SortResult(order=order, cycle=None)
    return SortResult(order=None, cycle=_find_cycle(g))  # leftovers = cycle
Enter fullscreen mode Exit fullscreen mode

That final len(order) != len(g.nodes) is the cycle test: if some node's in-degree never hits zero, it's in a cycle. Determinism is pinned by a test:

def test_deterministic_tiebreak():
    # three independent nodes → lexicographic order
    assert order_of("b:\na:\nc:\n") == ["a", "b", "c"]
Enter fullscreen mode Exit fullscreen mode

Hinge 2: name the cycle

Kahn's tells you a cycle exists, not where. Lots of tools stop at error: cycle detected, leaving you to stare at the graph. A second pass — DFS with white/gray/black coloring — extracts one concrete cycle:

  • white = unvisited, gray = on the current recursion stack, black = finished
  • reaching a gray node during DFS = a back-edge = a cycle
  • the recursion stack from that node onward is the cycle
def _find_cycle(g):
    WHITE, GRAY, BLACK = 0, 1, 2
    color = {n: WHITE for n in g.nodes}
    stack = []

    def dfs(u):
        color[u] = GRAY
        stack.append(u)
        for v in sorted(g.adj[u]):
            if color[v] == GRAY:               # back-edge → cycle
                idx = stack.index(v)
                return stack[idx:] + [v]       # stack slice + return node
            if color[v] == WHITE:
                found = dfs(v)
                if found is not None:
                    return found
        stack.pop()
        color[u] = BLACK
        return None

    for n in sorted(g.nodes):
        if color[n] == WHITE:
            found = dfs(n)
            if found is not None:
                return found
    return []
Enter fullscreen mode Exit fullscreen mode

So instead of "there is a cycle" you get:

error: graph has a cycle:
  auth -> cache -> db -> auth
Enter fullscreen mode Exit fullscreen mode

— the actual path, plus exit code 1 so shell scripts can detect it. The user fixes those three edges and moves on.

Cycles hidden behind innocent branches still get found

DFS starts from every node, so a cycle survives even when an unrelated acyclic subgraph is present:

def test_cycle_with_innocent_tail():
    # x -> y is fine; a <-> b is the cycle. Must still be detected.
    res = topo_sort(parse_graph("x: y\ny:\na: b\nb: a\n"))
    assert res.order is None
    assert set(res.cycle) <= {"a", "b"}
    assert res.cycle[0] == res.cycle[-1]   # path returns to its start
Enter fullscreen mode Exit fullscreen mode

Self-loops (a: a), 2-node and 3-node cycles each get their own test too.

CLI

text = open(args.file).read() if args.file else sys.stdin.read()
graph = parse_graph(text)
result = topo_sort(graph)
if result.cycle is not None:
    print("error: graph has a cycle:", file=sys.stderr)
    print("  " + " -> ".join(result.cycle), file=sys.stderr)
    return 1
for node in (reversed(result.order) if args.reverse else result.order):
    print(node)
return 0
Enter fullscreen mode Exit fullscreen mode

--reverse for dependents-first, --numbered for positions, and stdin support so cat deps.txt | toposort works.

Architecture

src/graph.py — parse + Kahn's sort + DFS cycle extraction (stdlib only)
src/cli.py   — argparse, file/stdin, exit codes
main.py      — entry point
Enter fullscreen mode Exit fullscreen mode

Zero dependencies. 21 pytest tests: parsing, ordering validity, deterministic tie-breaking, self-loops, 2- and 3-node cycles, and cycles hidden behind acyclic tails. Docker: python:3.12-alpine build → alpine runtime, non-root.

Run it

python3 main.py examples/build.deps
python3 -m pytest          # 21 tests
docker build -t toposort . && docker run --rm -i toposort < examples/build.deps
Enter fullscreen mode Exit fullscreen mode

Takeaways

  • Kahn's algorithm: emit zero-in-degree nodes; a min-heap makes the output stable.
  • len(order) != node_count is the cycle test — no separate check needed.
  • Don't stop at "cycle detected" — DFS gray-node revisit extracts the actual path (auth -> cache -> db -> auth).
  • DFS from every node finds cycles hidden behind unrelated branches.
  • Split exit codes (0 ok / 1 cycle) so the tool composes in shell.

This is OSS portfolio #268 from SEN LLC (Tokyo). https://sen.ltd/portfolio/

Top comments (0)