How I cut a 282-hour exact solver down to 22 minutes

#algorithms #optimization #python #performance

The problem

Last month I watched a Numberphile video about covering prime-indexed lattice points with the fewest possible lines. Sounds tame — it's NP-hard in general, but small instances should be tractable, right? The public record for one of the gnarlier configurations sat at 282 hours of solver time. Three good nights of sleep and a long weekend.

I figured I could shave some time off. I ended up shaving four orders of magnitude, then certifying 20 new awkward instances nobody had pinned down before. The interesting part wasn't the final solver — it was figuring out where the previous approaches were bleeding cycles.

If you've ever stared at an ILP solver chewing through a search tree for hours while the optimality gap barely moves, this post is for you. The same mistakes show up in scheduling, set cover, TSP variants, and any constraint program you'll write this year.

Why exact solvers stall

The default story goes: "it's NP-hard, so of course it's slow." That's a cop-out. NP-hardness tells you the problem is hard in general. It does not tell you whether your specific instance is hard, and it absolutely does not excuse a solver exploring 10^9 nodes when 10^4 would do.

When an exact solver crawls, one of these is almost always the culprit:

Weak LP relaxation — the lower bound is so loose that branch-and-bound can't prune anything useful
Bad branching order — you're branching on the wrong variables first, growing a fat tree
No symmetry breaking — the solver keeps re-exploring isomorphic subproblems forever
A natural formulation instead of a tight one — correct, but ugly to the LP

When I dissected the prior solver, three of these were happening at once. Let's walk through the fixes in the order that mattered most.

Step 1: tighten the formulation

The naive ILP for minimum line cover is one binary per candidate line, one constraint per point requiring it to be covered. It's correct. It's also a disaster, because the LP relaxation gives you fractional solutions where lines are "0.3 chosen" and points are covered "in pieces."

The fix is to add disjoint-pair cuts. If two points share no common candidate line, then any feasible solution must commit to at least two distinct lines covering them. The LP doesn't figure this out on its own — you have to hand it the cut.

# Build the set-cover model with strengthening cuts
from pulp import LpProblem, LpVariable, LpMinimize, lpSum, LpInteger

def build_model(points, lines):
    # lines[i] is the set of point indices covered by line i
    m = LpProblem("line_cover", LpMinimize)
    x = [LpVariable(f"x_{i}", 0, 1, LpInteger) for i in range(len(lines))]
    m += lpSum(x)  # minimize number of chosen lines

    # Standard covering: every point hit by at least one line
    for p in range(len(points)):
        covering = [i for i, s in enumerate(lines) if p in s]
        m += lpSum(x[i] for i in covering) >= 1

    # Strengthening: if p and q share no candidate line,
    # the LP must commit two units of mass across their covers
    for p, q in disjoint_point_pairs(points, lines):
        union = covering_either(p, q, lines)
        m += lpSum(x[i] for i in union) >= 2
    return m, x

That single addition pushed the LP bound from roughly 0.6 of optimal to roughly 0.95 of optimal on my test instances. Branch-and-bound suddenly had something to prune with.

Step 2: branch on the right variable

Most solvers branch on the most-fractional variable by default. For covering problems this is almost always wrong. You want to branch on the variable that most constrains the rest of the problem — typically the line that covers the most uniquely-hard points.

def branching_variable(x_values, lines, n_points):
    # A "hard" point is one with few alternative covers
    point_alts = [sum(1 for s in lines if p in s) for p in range(n_points)]

    best, best_score = None, -1.0
    for i, v in enumerate(x_values):
        if 0.01 < v < 0.99:  # only fractional vars are branchable
            # Reward lines that cover points with few escape routes
            score = sum(1.0 / point_alts[p] for p in lines[i])
            if score > best_score:
                best, best_score = i, score
    return best

Same instance, same machine: from 14 hours to 38 minutes. The tree had the same conceptual shape. I was just slicing it from a sharper angle.

Step 3: break the symmetries

This is where the previous solver was hemorrhaging cycles. Many prime-point configurations have geometric symmetries — rotations, reflections, lattice translations. A solver without symmetry awareness will happily explore "pick line A then line B" and "pick line B then line A" as separate branches, and it'll do that for every symmetric pair in the orbit.

The cheapest fix is canonical-form pruning: when you reach a partial solution, compute its canonical representative under the symmetry group. If you've already explored an isomorphic partial solution, skip it.

def canonical_form(chosen_lines, sym_group):
    # sym_group is a precomputed list of permutations on line indices
    # Return the lexicographically smallest equivalent set
    return min(tuple(sorted(g(chosen_lines))) for g in sym_group)

seen = set()

def search(state):
    key = canonical_form(state.chosen, sym_group)
    if key in seen:
        return  # we've already explored an isomorphic subproblem
    seen.add(key)
    # ...continue branching from here

Add this and the runtime drops another order of magnitude. Combined effect on the original 282-hour instance: 22 minutes.

Preventing the next 282-hour run

A few things I'd internalize going forward:

Check your LP gap before optimizing anything else. If the relaxation bound is way below optimal, no amount of branching cleverness will save you. Fix the model first.
Profile the search tree, not the inner loop. The bottleneck is rarely your code. Log node counts, prune rates, and incumbent quality over time — that's where the real cost lives.
Look for symmetry early. If your problem has any group structure, the solver doesn't see it unless you point it out. Even a single canonical-form check can collapse huge swaths of the search.
Sanity-check with a trivial bound. I had a hand-written greedy lower bound that disagreed with the LP for 12 instances. Every one turned out to be a model bug, not a math bug.

The meta-lesson: don't treat the solver as a black box. The "NP-hard" label is a starting point, not a verdict. Real instances almost always have structure, and exact solvers reward anyone willing to dig for it.