Can We Compute the Minimum Spanning Tree in Linear Time?

#algorithms #computerscience #complexity

Easter egg at the end of how the cover photo is related to the post

Given a connected graph with weighted edges, the minimum spanning tree (MST) is the cheapest set of edges that keeps all vertices connected. Boruvka gave the first algorithm in 1926. Kruskal, Prim, and others refined it. The best randomized algorithm runs in O(m) expected time (Karger-Klein-Tarjan, 1995). The best deterministic algorithm runs in O(m α(m,n)) (Chazelle, 2000), where α is the inverse Ackermann function.

α(m,n) ≤ 4 for any graph that could physically exist. In practice, the problem is solved. In theory, the question "can we do O(m) deterministically?" has been open for over forty years.

This post covers the three attack vectors from my 2018 thesis at DIKU, updated with connections to research published since then.

Where the bottleneck is

Every MST algorithm uses some form of the disjoint-set (union-find) data structure. You process edges, and for each edge you ask: are these two vertices already connected? If not, merge their components.

With path compression and union by rank, m operations on n elements cost O(m α(m,n)). That α factor is the bottleneck. It comes from the analysis of path compression on a pointer machine.

On a Word RAM, Gabow and Tarjan (1983) showed you can do O(m + n) — if you know the union tree structure in advance. You precompute lookup tables for small components and answer FIND queries in O(1).

The catch: knowing the union tree means knowing the MST, which is what you're computing.

Two computation models

This matters because the bottleneck lives in the gap between them.

A pointer machine can follow pointers, compare values, and create new nodes. No arithmetic on addresses, no random access. This is how most graph algorithms actually work. On this model, the α factor from path compression cannot be removed.

A Word RAM has O(1) random access and can do arithmetic and bitwise operations on words. This enables table lookups — precomputing answers for small inputs and reading them in constant time.

The MST gap is a model gap. On a pointer machine, O(m α(m,n)) seems tight. On a Word RAM, O(m + n) is possible if the structure is known. The question is whether you can learn enough structure during computation to exploit the RAM model.

Attack vector 1: Approximate union-find

What if you approximately know the union tree?

Build a predicted tree T' using some fast heuristic (a few Boruvka phases, random sampling, anything that runs in O(m)). Use Gabow-Tarjan lookup tables based on T'. When a FIND query returns an answer that disagrees with the true structure, pay for a correction.

If the prediction is close enough to the true MST, most queries hit the lookup tables and cost O(1). The faulty queries create "phantom cycles" — the algorithm temporarily believes two vertices are connected when they aren't, or vice versa. These phantoms are bounded by the prediction error and can be cleaned up in a linear-time verification pass using King's MST verification algorithm.

This approach fits a framework that didn't exist when the thesis was written: learning-augmented algorithms (Mitzenmacher and Vassilvitskii, 2020). The formal question is: what prediction quality guarantees O(m + n) total cost?

The thesis proves the disjoint-set operations can be bounded to O(m + n) on the RAM model when allowing a bounded degree of error. The precise bounds on prediction quality needed for this to work are stated but not tight.

Attack vector 2: Density partition

Different algorithms are linear at different densities. Prim is linear on dense graphs (m ≥ n^p for p > 1). Kruskal-style algorithms are nearly linear on sparse graphs. Neither covers all graphs.

The density partition classifies vertices by degree:

HD (high degree): degree > c · log³(n)
LD (low degree): degree < c · log³(n)

Then by neighbor types: HDN (high-degree neighbors), LDN (low-degree neighbors), MDN (mixed). This classification takes O(m + n) — two passes over the adjacency lists.

For HD-HDN components (dense), run Prim. For LD-LDN components (sparse), run modified Kruskal with the approximate union-find from attack vector 1. The MDN boundary edges form a small graph — O(n / log³(n)) edges — that can be solved recursively.

At each recursion level, total work is O(m + n). The recursion depth is bounded. This is the glue that combines the other two subproblems.

Attack vector 3: Cycle hierarchy

Consider a sparse graph that is almost a tree. Most vertices have degree 1 or 2. The MST must include all edges connecting degree-1 vertices — there is no alternative path. So you can trim those edges and contract the leaves.

The problem: trimming cascades. Removing a leaf can create a new leaf, which creates another, propagating across the graph. Naive trimming is O(n²).

If you know the cycle structure, you know where each cascade stops — at a vertex that participates in a cycle. With this information, trimming runs in O(n).

The cycle hierarchy organizes cycles by nesting depth:

Level 0: triangles
Level 1: 4-cycles not decomposable into triangles
Level k: (k+3)-cycles that may contain lower-level cycles

Building the hierarchy seems circular — you need a spanning tree to find cycles, but you're computing the MST. The escape: cycle detection doesn't require edge weights. Any spanning tree (from DFS or BFS, computed in O(m + n) without weight comparisons) reveals the cycle structure. Each non-tree edge closes exactly one fundamental cycle.

You compute the topological hierarchy weight-free, then apply it to the weighted MST computation. The hierarchy tells trimming where to stop.

This is the hardest subproblem. The construction is clear in outline, but the formal proof that it runs in O(m + n) for arbitrary graphs is incomplete.

How they compose

Subproblem 1                    Subproblem 3
(Approximate Union-Find)        (Cycle Hierarchy)
         \                          /
          v                        v
           Subproblem 2
           (Density Partition)
                  |
                  v
          Full MST Algorithm
              O(m + n)

Subproblems 1 and 3 are independent. Subproblem 2 combines their results. If all three are solved, they compose into a deterministic O(m + n) MST algorithm on the Word RAM.

On a pointer machine, this approach does not help. The disjoint-set operations without table lookup are bounded by O(m α(m,n)), and that factor cannot be removed without random access.

What's incomplete

Subproblem 1 has a theorem and proof sketch. The formal bounds on prediction quality are not tight. Subproblem 3 has a clear construction but the O(m + n) proof for arbitrary graphs is missing. Subproblem 2 depends on both.

Since 2018, each subproblem has connected to active research:

Approximate union-find → learning-augmented algorithms
Density partition → local graph algorithms and graph sparsification
Cycle hierarchy → dynamic connectivity and top trees

The three subproblems remain open. Whether O(m) MST is achievable deterministically, or whether α(m,n) is fundamental, is one of the oldest open problems in computer science.

The full technical writeup with proofs, definitions, and historical context is on my blog: The MST Problem: Three Subproblems to Linear Time.

The thesis was written at DIKU, University of Copenhagen, supervised by Jyrki Katajainen.

Easter egg time: Bernard Chazelle the CS professor from Toronto who has the best algorithm for the problem up to date is the father of Damien Chazelle, the director of Whiplash and LaLa Land.