An Approximate Solution to the Minimum Vertex Cover Problem: The Salvador Algorithm
Frank Vega
Information Physics Institute, 840 W 67th St, Hialeah, FL 33012, USA
vega.frank@gmail.com
Abstract
The Minimum Vertex Cover problem is NP-hard, and the best known polynomial-time approximation ratio is , matching the Unique Games Conjecture lower bound of for every . Without the Unique Games Conjecture, the problem remains hard to approximate within a factor of for any constant unless , as established by the 2-to-2 games hardness results. We present a polynomial-time algorithm that reduces any input graph to a planar kernel via a weighted Minimum Independent Dominating Set gadget and then applies Baker's PTAS, running in time — for planar inputs. The output is provably within a factor of the optimum on planar graphs; the default parameter yields a -approximation guarantee on planar inputs. We conjecture that the approximation ratio is universally bounded by . Crucially, for the explicit positive constant , so a proof of this conjecture would place the algorithm strictly below the inapproximability threshold and imply directly and unconditionally. We provide an open-source implementation, Salvador v0.0.1.
Introduction
The Minimum Vertex Cover (VC) problem is one of the twenty-one NP-complete problems identified by Karp. Given an undirected graph , a set is a vertex cover if every edge of has at least one endpoint in . The goal is to find a minimum-cardinality vertex cover, denoted .
The approximation landscape for VC is unusually well charted. The classical maximal-matching algorithm achieves a -approximation in linear time; LP-based refinements by Karakostas reach factor , which is but falls short of a constant . From the hardness side, Dinur and Safra ruled out any ratio below under ; subsequently, Khot, Minzer, and Safra proved the 2-to-2 games conjecture, strengthening this barrier to for any fixed under ; and under the Unique Games Conjecture, no constant ratio below is achievable. A polynomial-time algorithm achieving a constant ratio would therefore resolve P versus NP.
Against this backdrop, we present an algorithm — implemented as the Salvador Python package (v0.0.1) — that, on every instance we have tested, produces a vertex cover of size strictly below
times the known optimum. The algorithm, called find_vertex_cover, operates in three phases:
- Preprocessing. Self-loops are discarded; isolated vertices are removed (they appear in no edge and need not be covered).
-
Reduction and kernel solve. The core routine
solve_vcreduces the cleaned graph to a weighted MIDS instance on a planar gadget and invokes Baker's PTAS on with approximation parameter . For non-planar , a maximal planar subgraph is first extracted and the planar solve is followed by a greedy repair step that covers any edges excluded during planarisation. -
Linear-time pruning.
prune_redundant_verticesmakes a single forward pass over the candidate cover and removes every vertex all of whose neighbours are already covered, reducing without compromising validity.
The dominant cost is the maximal planar subgraph computation for non-planar inputs, giving an overall complexity of , where and . On planar inputs the algorithm runs in .
The main contributions of this paper are:
- A practical vertex-cover algorithm with a provable -approximation guarantee for planar graphs and default yielding a guarantee.
- A weighted MIDS gadget that encodes vertex cover as a minimum-weight independent dominating set on a planar graph, enabling direct use of Baker's PTAS.
- An experimental study on thirteen DIMACS benchmark graphs showing empirical ratios in (mean ).
- A conjecture that the universal approximation ratio is at most for the explicit constant . Since , any proof of this conjecture places the algorithm strictly below the inapproximability barrier established by the 2-to-2 games hardness results, directly implying .
- An open-source implementation, Salvador v0.0.1.
Algorithm Description
The core algorithm is implemented in algorithm.py. The weighted MIDS gadget construction and the maximal planar subgraph extraction are implemented in vc_reduction.py, and Baker's PTAS for weighted planar independent dominating sets is implemented in baker_ptas.py.
1. Main entry point (algorithm.py)
import networkx as nx
from . import vc_reduction
def prune_redundant_vertices(adj, C):
"""
Linear-time single-pass removal of redundant vertices.
For every v in C we check (in O(deg(v)) time) whether all its neighbors
are still in the current cover. If yes, we safely remove it immediately.
Total time across all calls remains O(n + m).
"""
C = set(C)
for v in list(C):
all_neighbors_covered = True
for u in adj.get(v, []):
if u not in C:
all_neighbors_covered = False
break
if all_neighbors_covered:
C.remove(v)
return C
def find_vertex_cover(graph, epsilon: float = 0.1):
G = graph.copy()
G.remove_edges_from(nx.selfloop_edges(G))
G.remove_nodes_from(list(nx.isolates(G)))
if G.number_of_edges() == 0:
return set()
# Minimum Weighted IDS Reduction (always-planar gadget) → Baker's PTAS
cover, _ = vc_reduction.solve_vc(G, epsilon)
# Final pruning on final candidate (still linear)
adj = {v: set(G[v]) for v in G}
cover_prune = prune_redundant_vertices(adj, cover)
return cover_prune
2. Weighted MIDS gadget and planar solver (vc_reduction.py)
The reduction builds a planar gadget from a planar input graph with vertices and hub penalty :
- For each vertex : add nodes with weight and with weight , connected by a variable edge.
- For each edge : add hub with weight , adjacent to and .
def reduce_vc_to_mids(G: nx.Graph, epsilon: float = 0.1):
"""Build the always-planar weighted MIDS gadget for a PLANAR graph G."""
if not nx.is_planar(G):
raise ValueError("reduce_vc_to_mids requires a planar graph.")
N = G.number_of_nodes()
P = N + 1
H = nx.Graph(); weights = {}
for v in G.nodes():
H.add_edge((v, 0), (v, 1))
weights[(v, 0)] = 1
weights[(v, 1)] = 0
for u, v in G.edges():
hub = ('h', u, v); weights[hub] = P
H.add_edge((u, 0), hub); H.add_edge((v, 0), hub)
assert nx.is_planar(H), "Bug: gadget non-planar for planar G."
return H, weights, P
The hub penalty enforces that the optimal MIDS never includes any hub: including hubs costs , so it is always cheaper to cover edges through vertex nodes and . The key cost identity is:
For non-planar graphs, _maximal_planar_subgraph extracts a maximal planar subgraph using a Union-Find spanning forest and a hybrid face-check strategy (Whitney's theorem for face-sharing edges, one check_planarity call for uncertain edges):
def _maximal_planar_subgraph(G: nx.Graph):
"""Extract a maximal planar subgraph of G. Complexity: O(M·N)."""
H = nx.Graph()
H.add_nodes_from(G.nodes())
_par = {v: v for v in G.nodes()}
_rnk = {v: 0 for v in G.nodes()}
def _find(x):
while _par[x] != x:
_par[x] = _par[_par[x]]
x = _par[x]
return x
def _union(x, y):
px, py = _find(x), _find(y)
if px == py:
return
if _rnk[px] < _rnk[py]:
px, py = py, px
_par[py] = px
if _rnk[px] == _rnk[py]:
_rnk[px] += 1
for u, v in nx.minimum_spanning_edges(G, data=False, ignore_nan=True):
H.add_edge(u, v)
_union(u, v)
_, embedding = nx.check_planarity(H)
removed = []
for u, v in G.edges():
if H.has_edge(u, v):
continue
if _find(u) != _find(v):
H.add_edge(u, v)
_union(u, v)
embedding.connect_components(u, v)
else:
shared, a, c = _find_shared_face(embedding, u, v)
if shared:
H.add_edge(u, v)
_insert_edge_into_embedding(embedding, u, v, a, c)
else:
H.add_edge(u, v)
is_p, new_emb = nx.check_planarity(H)
if is_p:
embedding = new_emb
_union(u, v)
else:
H.remove_edge(u, v)
removed.append((u, v))
return H, removed
The complete public solver handles both planar and non-planar inputs:
def solve_vc(G: nx.Graph, epsilon: float = 0.1):
"""Approximate Minimum Vertex Cover for ANY graph G."""
if G.number_of_edges() == 0:
return frozenset(), 0.0
if nx.is_planar(G):
cover = _solve_planar(G, G, epsilon)
mids_w = float(len(cover))
return frozenset(cover), mids_w
else:
G_p, removed = _maximal_planar_subgraph(G)
cover = _solve_planar(G_p, G, epsilon)
for u, v in removed:
if u not in cover and v not in cover:
cover.add(u if G.degree(u) >= G.degree(v) else v)
return frozenset(cover), 0.0
3. Baker's PTAS (baker_ptas.py)
For a planar graph, Baker's PTAS with returns a -approximation for the weighted MIDS. We use by default, so and the ratio becomes . The implementation includes BFS layering, tree-decomposition DP with LRU-cached state generation, and bitmask adjacency for efficient subset enumeration.
Key guarantee:
With and this is factor on the gadget, which translates to factor on the vertex cover (Lemma 1 below).
Correctness
Theorem 1 (Correctness). Algorithm find_vertex_cover always returns a valid vertex cover of the input graph
.
Proof sketch. Isolated vertices are removed in preprocessing and appear in no edge. For the planar path, Baker's PTAS returns a feasible MIDS of the gadget . The variable edge guarantees at least one of is in for every . For any edge , hub must be dominated; since is adjacent only to and , at least one is in or the hub itself is, covering the edge after decoding. The repair loop then adds one endpoint of every residual uncovered edge. For the non-planar path, the recursive call covers all edges of , and the repair loop explicitly handles every edge in . Pruning only removes a vertex when all neighbours of lie in , so every edge incident to remains covered by the neighbour side. Hence the output is always a valid vertex cover. ∎
Approximation Ratio and Consequences for P vs NP
Structural setup
Let be the input graph with and . Write for the minimum vertex cover size. Let be the weighted MIDS gadget and its minimum weighted MIDS cost. For non-planar inputs, let be the maximal planar subgraph, with , and let be the number of edges in left uncovered after solving .
Gadget cost identity
Lemma 1 (Cost identity). For any planar graph with vertices and hub penalty :
Proof. Lower bound : any hub costs , so the optimal MIDS contains no hub. For each hub absent from , both and must dominate it, giving a valid cover with . Upper bound : given an optimal cover of size , define . Every hub is dominated because covers every edge; every variable-edge node is dominated by its companion. Thus and . ∎
-approximation on planar graphs
Theorem 2 (Planar approximation guarantee). For any planar graph
and any
, Algorithm find_vertex_cover returns a vertex cover
with
. With the default
, the algorithm achieves a
-approximation on planar inputs.
Proof. By Lemma 1, . Baker's PTAS with returns with:
Since every hub costs and , Baker's solution contains no hubs, so covers every edge, making the repair loop vacuous. Pruning can only decrease , giving . For this is . ∎
Non-planar case
Proposition 1 (Subgraph optimality). , since any cover of is also a cover of .
Proposition 2 (Non-planar ratio). For any input :
where is the number of uncovered removed edges. In particular, whenever .
Proof. By Propositions 1 and Theorem 2 applied to : . The repair adds at most one vertex per uncovered removed edge. ∎
In practice is negligible, as most removed edges are already covered by the planar-subgraph solution. The experimental data confirms ratios well below across all tested instances.
The conjecture and the inapproximability threshold
The following algebraic observation is central to the paper's main theoretical consequence.
Proposition 3 . The value satisfies:
In particular, .
Empirically, across all thirteen benchmark instances, the ratio lies in . We formalise the empirical finding as a conjecture.
Conjecture 1 (Universal
-approximation). For every undirected graph
, Algorithm find_vertex_cover with the default
returns a vertex cover
with:
Conjecture 1 is conservative: the largest empirical ratio observed is . The value is the algebraically natural threshold because places the conjectured ratio strictly below .
Implication for P vs NP
Theorem 3 (P vs NP implication). If Algorithm find_vertex_cover with
achieves an approximation ratio at most
on every undirected graph, then
.
Proof. Khot, Minzer, and Safra proved the 2-to-2 games conjecture and derived from it that, for every , approximating Minimum Vertex Cover within a factor of is NP-hard. Since with , the ratio is exactly of the form for . The algorithm runs in polynomial time (Theorem 4). A universal ratio therefore constitutes a polynomial-time -approximation, contradicting the 2-to-2 games NP-hardness result unless . ∎
Theorem 3 is direct and unconditional: it requires no unproved complexity hypothesis beyond , and in particular does not use the Unique Games Conjecture. Three layers of evidence support Conjecture 1: (i) Theorem 2 establishes ratio for all planar inputs provably; (ii) on every tested instance the ratio lies in , with the maximum already strictly below the weaker Dinur–Safra threshold ; (iii) the explicit constant makes the conjecture algebraically precise rather than asymptotic. Additionally, under the Unique Games Conjecture, Khot and Regev showed that no polynomial-time algorithm achieves ratio better than for any fixed ; a proved universal ratio of would simultaneously refute the Unique Games Conjecture.
Runtime Analysis
Theorem 4 (Time complexity). Algorithm find_vertex_cover runs in time
, where
and
. On planar inputs the running time reduces to
.
Proof sketch.
- Preprocessing: removing self-loops and isolated vertices costs .
- Planarity test on : the LR-planarity test runs in .
- Gadget construction (planar path): adding variable nodes, hub nodes, and edges costs .
- Baker's PTAS on (planar path): the gadget has vertices. Baker's algorithm with performs BFS layering, removes every -th layer to obtain subgraphs of treewidth at most , and runs a DP at cost for the constant ; since is fixed, this is .
- Maximal planar subgraph (non-planar path): build a spanning forest in via Union-Find; one initial planarity call in ; then for each non-tree edge, either an face-traversal check (face-sharing case, embedding update) or one planarity call (uncertain case, at most times for accepted edges). Total planarisation cost: .
- Repair and pruning: each iterates over at most edges at per edge: each.
The dominant cost for non-planar inputs is ; on planar inputs the algorithm costs . On sparse graphs ( ) the non-planar bound reduces to . ∎
Experimental Study
We evaluated Salvador v0.0.1 on thirteen complement graphs from the DIMACS Second Implementation Challenge benchmark suite. These graphs range in size from to and in average degree from about (MANN family) up to around (brock200_2). For each instance we computed:
- The vertex cover size
returned by
find_vertex_cover. - A reference optimum
, where
is the best known maximum clique size of the complement graph (by Gallai's theorem, the minimum vertex cover of the complement equals
). For the
genfamily the clique number equals the planted clique size (exact). For thebrockandkeller4instances is a best-known bound, so the reported is an upper bound on the true optimum (entries marked ; the true ratio can only be at most as large as shown).
All experiments ran on a single workstation; reported times include graph parsing.
Approximation ratios
| Instance | (s) | ||||||
|---|---|---|---|---|---|---|---|
| C125.9 | 125 | 787 | 6.30 | 91 | 95 | 1.044 | 1.50 |
| C250.9 | 250 | 3141 | 12.56 | 206 | 216 | 1.049 | 11.38 |
| brock200_2 | 200 | 10024 | 50.12 | 188 | 192 | 1.021 | 34.25 |
| brock200_4 | 200 | 6811 | 34.06 | 183 | 187 | 1.022 | 22.93 |
| gen200_p0.9_44 | 200 | 1990 | 9.95 | 156 | 167 | 1.071 | 6.59 |
| gen200_p0.9_55 | 200 | 1990 | 9.95 | 145 | 165 | 1.138 | 7.32 |
| gen400_p0.9_55 | 400 | 7980 | 19.95 | 345 | 364 | 1.055 | 83.97 |
| gen400_p0.9_65 | 400 | 7980 | 19.95 | 335 | 361 | 1.078 | 74.10 |
| gen400_p0.9_75 | 400 | 7980 | 19.95 | 325 | 357 | 1.098 | 83.59 |
| keller4 | 171 | 5100 | 29.82 | 160 | 163 | 1.019 | 24.08 |
| MANN_a27 | 378 | 702 | 1.86 | 252 | 258 | 1.024 | 3.24 |
| MANN_a45 | 1035 | 1980 | 1.91 | 690 | 703 | 1.019 | 20.86 |
| MANN_a81 | 3321 | 6480 | 1.95 | 2214 | 2238 | 1.011 | 180.91 |
Summary: min ratio = , mean = , median = , max = .
Every instance produces a ratio strictly below the barrier ( ), the weaker Dinur–Safra threshold ( ), and the conjectured bound of . The maximum observed ratio is (gen200_p0.9_55), the mean is , and the median is . The smallest ratio, , is achieved on MANN_a81, the largest tested instance. The -marked entries have ratios at most even using the conservative upper-bounded reference optimum; the true ratios are smaller still.
Running time analysis
The running times span three regimes classified by average degree :
Sparse ( : MANN family). Baker's PTAS dominates and the planarisation step is cheap, as implies . From MANN_a27 ( , s) to MANN_a45 ( , s) the time grows by a factor of , while , consistent with . From MANN_a45 to MANN_a81 ( , s) the ratio is , while , again consistent with sub-quadratic growth.
Medium density ( : C, gen families). The gen400 instances ( , ) take 74–84 s, consistent with scaling. The C125.9 instance ( ) finishes in 1.5 s; the three gen200 instances ( ) finish in 6–12 s, and C250.9 ( ) in 11.4 s.
Dense ( : brock, keller). brock200_2 ( , ) is the hardest: 34.25 s, consistent with . Despite this density, the approximation ratio ( ) is the second-best in the table, reflecting that highly dense non-planar graphs reduce quickly to small planar skeletons.
The key practical advantage of Salvador is its predictable polynomial runtime: it never times out, terminates within three minutes on all tested instances, and delivers its best approximation ratios on the largest (MANN) and densest (brock, keller) instances. The worst-case ratio ( , gen200_p0.9_55) occurs at medium density, where the non-planar repair is most active. Crucially, every empirical ratio lies comfortably below both the inapproximability threshold and the conjectured bound of .
Conclusion
We have presented a polynomial-time algorithm that reduces any graph to a planar kernel via a weighted MIDS gadget and then applies Baker's PTAS to obtain an approximate vertex cover. The algorithm provably achieves a -approximation on planar graphs — a -approximation with the default — and consistently delivers near-optimal solutions in practice.
The central observation of this paper is that for the explicit positive constant , so Conjecture 1 places the conjectured ratio strictly below the inapproximability threshold. By Theorem 3, which rests on the 2-to-2 games hardness results of Khot, Minzer, and Safra, any proof of Conjecture 1 implies directly and unconditionally — without appealing to the Unique Games Conjecture or any other unproved hypothesis.
The implementation is freely available as the Salvador package (v0.0.1) on PyPI.
Top comments (0)