Frank Vega

Posted on Apr 30 • Edited on May 23

The Salvador Algorithm

#discuss #computerscience #algorithms #python

An Approximate Solution to the Minimum Vertex Cover Problem: The Salvador Algorithm

Frank Vega

Information Physics Institute, 840 W 67th St, Hialeah, FL 33012, USA

vega.frank@gmail.com

Abstract

The Minimum Vertex Cover problem is NP-hard. The best known polynomial-time approximation ratio is $2$ , matching the Unique Games Conjecture lower bound of $2-\varepsilon$ ; without that conjecture, 2-to-2 games hardness gives inapproximability within $\sqrt{2}-\varepsilon$ under $\mathrm{P}\neq\mathrm{NP}$ . We present Salvador v0.0.2, an approximate vertex-cover solver whose current implementation builds a linear-size spanning-forest planar core, reduces it to a weighted Minimum Independent Dominating Set gadget, solves that gadget by a linear weighted independent-dominating-set pass, repairs all non-core edges, and prunes redundant vertices. The algorithm runs in $O(n+m)$ time and always returns a valid vertex cover. On 68 NPBench clique-complement instances, Salvador's empirical ratio ranges from $1.000$ to $1.197$ with mean $1.030$ ; using the NetworkX 2-approximation baseline gives an a posteriori certified upper bound no larger than $1.986$ . We conjecture that the approximation ratio is universally bounded by $1.4$ . Since $1.4=\sqrt{2}-\varepsilon_0$ for $\varepsilon_0=\sqrt{2}-\tfrac{7}{5}\approx0.0142$ , proving the conjecture would place the algorithm strictly below the $\sqrt{2}$ inapproximability threshold.

Introduction

Given an undirected graph $G=(V,E)$ , a set $C\subseteq V$ is a vertex cover if every edge has at least one endpoint in $C$ . The goal is to minimize $|C|$ ; the optimum is denoted $\tau(G)$ .

The current Salvador implementation differs from earlier drafts that used repeated planarity tests and maximal-planar-subgraph extraction. The public routine find_vertex_cover now runs through four linear phases:

Preprocessing. Remove self-loops and isolated vertices.
Linear planar core. Build a spanning forest $G_p$ with Union-Find in one pass over $E$ . The rejected cycle-closing edges are stored for repair.
Weighted MIDS gadget and linear IDS pass. Build a planar weighted MIDS gadget on $G_p$ , solve it by a linear cheap-first maximal independent set pass, decode candidate cover vertices, and repair residual core edges.
Original-edge repair and pruning. Scan all removed edges once, add the higher-degree endpoint if still uncovered, and prune redundant cover vertices in a single adjacency pass.

The main contributions are:

A practical $O(n+m)$ vertex-cover algorithm based on a linear spanning-forest planar core and one repair scan.
A weighted MIDS gadget for the planar core, preserving the original vertex-cover encoding while avoiding repeated planarity augmentation.
A current-code experimental study on 68 NPBench clique-complement instances with empirical ratios in $[1.000,\,1.197]$ and mean $1.030$ .
An a posteriori certified upper-bound column computed from the NetworkX 2-approximation baseline.
An open-source implementation, Salvador v0.0.2.

Current Python Implementation

Main Entry Point (`algorithm.py`)

import itertools
from . import utils

import networkx as nx
from . import vc_reduction


def prune_redundant_vertices(adj, C):
    """
    Linear-time single-pass removal of redundant vertices.
    For every v in C we check whether all its neighbors are still in the
    current cover. If yes, we safely remove it immediately.
    """
    C = set(C)
    for v in list(C):
        all_neighbors_covered = True
        for u in adj.get(v, []):
            if u not in C:
                all_neighbors_covered = False
                break
        if all_neighbors_covered:
            C.remove(v)
    return C


def find_vertex_cover(graph, epsilon: float = 0.1):
    G = graph.copy()
    G.remove_edges_from(nx.selfloop_edges(G))
    G.remove_nodes_from(list(nx.isolates(G)))

    if G.number_of_edges() == 0:
        return set()

    # Linear forest-core MIDS reduction with one repair scan.
    cover, _ = vc_reduction.solve_vc(G, epsilon)

    adj = {v: set(G[v]) for v in G}
    return prune_redundant_vertices(adj, cover)

Linear Planar Core and MIDS Reduction (`vc_reduction.py`)

For a core graph $G_p$ with $N$ vertices, the gadget has a penalty $P=N+1$ . For every vertex $v$ , it adds $(v,0)$ of weight $1$ and $(v,1)$ of weight $0$ ; for every core edge $(u,v)$ , it adds a hub $h_{uv}$ of weight $P$ adjacent to $(u,0)$ and $(v,0)$ .

\text{weighted MIDS cost} = |\text{cover}| + (N+1)\cdot |\text{uncovered core edges}|.

def reduce_vc_to_mids(G: nx.Graph, epsilon: float = 0.1,
                      assume_planar: bool = False):
    if not assume_planar and not nx.is_planar(G):
        raise ValueError("reduce_vc_to_mids requires a planar graph.")

    N = G.number_of_nodes()
    P = N + 1

    H = nx.Graph(); weights = {}
    for v in G.nodes():
        H.add_edge((v, 0), (v, 1))
        weights[(v, 0)] = 1
        weights[(v, 1)] = 0
    for u, v in G.edges():
        hub = ('h', u, v); weights[hub] = P
        H.add_edge((u, 0), hub); H.add_edge((v, 0), hub)

    return H, weights, P


def _spanning_forest_planar_core(G: nx.Graph):
    H = nx.Graph()
    H.add_nodes_from(G.nodes())
    parent = {v: v for v in G.nodes()}
    rank = {v: 0 for v in G.nodes()}

    def find(x):
        while parent[x] != x:
            parent[x] = parent[parent[x]]
            x = parent[x]
        return x

    def union(u, v):
        ru, rv = find(u), find(v)
        if ru == rv:
            return False
        if rank[ru] < rank[rv]:
            ru, rv = rv, ru
        parent[rv] = ru
        if rank[ru] == rank[rv]:
            rank[ru] += 1
        return True

    removed = []
    for u, v in G.edges():
        if union(u, v):
            H.add_edge(u, v)
        else:
            removed.append((u, v))
    return H, removed


def solve_vc(G: nx.Graph, epsilon: float = 0.1):
    if G.number_of_edges() == 0:
        return frozenset(), 0.0

    G_p, removed = _spanning_forest_planar_core(G)
    cover = _solve_planar(G_p, G, epsilon)

    for u, v in removed:
        if u not in cover and v not in cover:
            cover.add(u if G.degree(u) >= G.degree(v) else v)

    return frozenset(cover), float(len(cover))

Linear Weighted IDS Pass (`baker_ptas.py`)

The current default weighted IDS solver is a linear cheap-first maximal independent set pass. A maximal independent set is also an independent dominating set, and the Salvador gadget uses only the weight classes $0$ , $1$ , and $P$ , so bucket ordering avoids sorting.

def greedy_maximal_is_weighted(adj, vertices, weights):
    zeros, ones, heavy = [], [], []
    for v in vertices:
        w = weights.get(v, 1)
        if w == 0:
            zeros.append(v)
        elif w == 1:
            ones.append(v)
        else:
            heavy.append(v)
    vlist = zeros + ones + heavy

    selected = set(); excluded = set()
    for v in vlist:
        if v not in excluded:
            selected.add(v); excluded.add(v)
            excluded.update(adj.get(v, ()))
    return frozenset(selected)


def baker_ptas_ids_weighted(adj, weights=None, epsilon=0.5):
    vertices = set(adj.keys())
    if not vertices:
        return frozenset()
    if len(vertices) == 1:
        return frozenset(vertices)
    if weights is None:
        weights = {}

    return greedy_maximal_is_weighted(adj, vertices, weights)

The older tree-decomposition Baker PTAS code remains in the module after this early return for experimentation and backward comparison, but it is not the default path used by find_vertex_cover.

Correctness

Theorem 1 (Validity). find_vertex_cover always returns a valid vertex cover of the input graph $G$ .

Proof sketch. Preprocessing removes only self-loops and isolated vertices. The spanning forest $G_p$ contains all vertices and a subset of original edges. The MIDS decode plus residual core repair covers every edge in $G_p$ . Every removed edge $(u,v)\in E\setminus E_p$ is then scanned: if neither endpoint is already in the cover, one endpoint is added. Therefore all original edges are covered. Finally, pruning removes a vertex $v$ only when every neighbor of $v$ remains in the cover, so every edge incident to $v$ stays covered by the other endpoint. ?

Approximation Ratio and Consequences for P vs NP

Let $\tau(G)$ be the optimum vertex-cover size. The present implementation has a direct validity guarantee and an empirical approximation profile. For each NPBench clique-complement graph we compute the reference optimum as $\tau^*=n-\omega$ , where $\omega$ is the DIMACS maximum-clique value for the original graph.

The central conjecture remains:

Conjecture 1 (Universal $1.4$ -approximation). For every undirected graph $G$ , find_vertex_cover returns a cover $C$ satisfying

|C| \le \frac{7}{5}\,\tau(G) = (\sqrt{2}-\varepsilon_0)\,\tau(G),\qquad \varepsilon_0=\sqrt{2}-\tfrac{7}{5}>0.

Since $1.4<\sqrt{2}$ , a proof of Conjecture 1 would give a polynomial-time approximation ratio strictly below the 2-to-2 games hardness threshold and would imply $\mathrm{P}=\mathrm{NP}$ .

Across the current NPBench run, the largest displayed Salvador/reference ratio is $1.197$ , below the conjectured $1.4$ threshold and below $\sqrt{2}\approx1.414$ .

Runtime Analysis

Theorem 2 (Time complexity). The current implementation of find_vertex_cover runs in $O(n+m)$ time, where $n=|V|$ and $m=|E|$ .

Proof sketch.

Preprocessing scans vertices and edges: $O(n+m)$ .
_spanning_forest_planar_core performs one Union-Find pass over $E$ : $O(n+m)$ , absorbing the inverse-Ackermann factor into the linear bound.
The gadget is built on a forest, so it has $O(n)$ vertices and edges.
The weighted IDS pass buckets vertices by weight and scans adjacency once through the maximal independent set construction: $O(n)$ on the forest gadget.
Repair scans each removed edge once: $O(m)$ .
Pruning scans adjacency around candidate cover vertices: $O(n+m)$ .

Therefore the total running time is $O(n+m)$ .

Experimental Study

We evaluated Salvador v0.0.2 on the NPBench vertex-cover clique-complement collection. The instances are DIMACS maximum-clique graphs converted to vertex-cover instances by complementing the edge set. A clique of size $\omega$ in the original graph becomes an independent set of size $\omega$ in the complement, so the vertex-cover reference optimum is:

\tau^* = n - \omega.

The logged run is stored as experiments/np_bench.txt and was produced with:

python -m salvador.batch -i ./experiments/ -c -a -l

or equivalently:

batch_vega -i ./experiments/ -c -a -l

The baseline is NetworkX min_weighted_vertex_cover, a Bar-Yehuda--Even local-ratio 2-approximation. We report both empirical ratios against the DIMACS reference optimum and the certified a posteriori Salvador bound:

\frac{|C_S|}{\tau^*} \le \frac{2|C_S|}{|C_N|},

where $C_S$ is Salvador's cover and $C_N$ is NetworkX's 2-approximation cover.

For example, on brock200_2.clq-compliment.txt, NetworkX computes a cover of size $199$ in $15.690$ ms, Salvador computes a cover of size $192$ in $21.458$ ms, and the certified upper bound is:

2\cdot 192/199 = 1.929648.

Rows marked dagger use a best-known clique lower bound, so the displayed vertex-cover optimum is an upper bound on the true optimum.

Instance	Opt VC	NetworkX VC	NetworkX/Opt	Salvador VC	Salvador/Opt	Certified upper bound	NetworkX ms	Salvador ms
`brock200_1`	179	197	1.101	184	1.028	1.868	0.000	21.115
`brock200_2`	188	199	1.059	192	1.021	1.930	15.690	21.458
`brock200_3`	185	197	1.065	190	1.027	1.929	0.998	29.073
`brock200_4`	183	198	1.082	187	1.022	1.889	3.405	37.961
`brock400_1`	373	397	1.064	382	1.024	1.924	0.000	66.091
`brock400_2`	371	396	1.067	381	1.027	1.924	3.232	56.858
`brock400_3`	369	396	1.073	381	1.033	1.924	0.000	69.879
`brock400_4`	367	398	1.084	383	1.044	1.925	15.741	69.259
`brock800_1`	777	798	1.027	785	1.010	1.967	17.816	390.844
`brock800_2`	776	798	1.028	785	1.012	1.967	10.028	306.896
`brock800_3`	775	796	1.027	785	1.013	1.972	11.772	303.142
`brock800_4`	774	798	1.031	784	1.013	1.965	15.647	309.083
`c-fat200-1`	188	198	1.053	188	1.000	1.899	2.913	43.041
`c-fat200-2`	176	198	1.125	176	1.000	1.778	1.978	38.133
`c-fat200-5`	142	199	1.401	142	1.000	1.427	2.356	27.579
`c-fat500-1`	486	498	1.025	486	1.000	1.952	11.976	274.073
`c-fat500-10`	374	499	1.334	374	1.000	1.499	0.000	198.928
`c-fat500-2`	474	498	1.051	474	1.000	1.904	15.514	362.902
`c-fat500-5`	436	499	1.144	436	1.000	1.747	5.511	255.697
`C1000.9`	<= 932 dagger	987	1.059	953	1.023	1.931	0.000	150.017
`C125.9`	91	115	1.264	94	1.033	1.635	0.000	6.151
`C250.9`	206	241	1.170	214	1.039	1.776	0.998	4.281
`C500.9`	<= 443 dagger	489	1.104	457	1.032	1.869	0.000	46.692
`gen200_p0.9_44`	156	192	1.231	167	1.071	1.740	0.000	15.628
`gen200_p0.9_55`	145	187	1.290	165	1.138	1.765	0.000	0.000
`gen400_p0.9_55`	345	388	1.125	359	1.041	1.851	3.732	19.340
`gen400_p0.9_65`	335	387	1.155	361	1.078	1.866	2.246	26.162
`gen400_p0.9_75`	325	385	1.185	356	1.095	1.849	3.169	23.254
`hamming10-2`	512	1023	1.998	512	1.000	1.001	1.073	46.221
`hamming10-4`	984	1023	1.040	996	1.012	1.947	11.849	258.114
`hamming6-2`	32	63	1.969	32	1.000	1.016	0.000	1.006
`hamming6-4`	60	63	1.050	62	1.033	1.968	0.000	2.023
`hamming8-2`	128	255	1.992	128	1.000	1.004	1.006	8.173
`hamming8-4`	240	255	1.062	240	1.000	1.882	2.248	28.899
`johnson16-2-4`	112	119	1.062	112	1.000	1.882	1.128	4.361
`johnson32-2-4`	480	495	1.031	480	1.000	1.939	0.000	49.459
`johnson8-2-4`	24	27	1.125	24	1.000	1.778	0.000	0.000
`johnson8-4-4`	56	69	1.232	56	1.000	1.623	0.999	5.097
`keller4`	160	170	1.062	162	1.012	1.906	1.743	59.447
`keller5`	749	775	1.035	758	1.012	1.956	16.164	183.929
`MANN_a27`	252	270	1.071	253	1.004	1.874	0.000	8.712
`MANN_a45`	690	735	1.065	695	1.007	1.891	0.986	23.164
`MANN_a81`	2221	2268	1.021	2225	1.002	1.962	3.769	79.834
`MANN_a9`	29	36	1.241	29	1.000	1.611	0.000	0.000
`p_hat1000-3`	<= 932 dagger	991	1.063	943	1.012	1.903	19.721	351.907
`p_hat300-1`	292	296	1.014	294	1.007	1.986	0.000	83.977
`p_hat300-2`	275	296	1.076	278	1.011	1.878	0.000	64.064
`p_hat300-3`	264	293	1.110	269	1.019	1.836	2.261	31.998
`p_hat500-1`	491	499	1.016	494	1.006	1.980	16.023	222.256
`p_hat500-2`	464	495	1.067	471	1.015	1.903	0.000	167.353
`p_hat500-3`	<= 450 dagger	489	1.087	459	1.020	1.877	5.095	89.239
`p_hat700-1`	689	699	1.015	693	1.006	1.983	31.294	453.505
`p_hat700-2`	<= 656 dagger	698	1.064	662	1.009	1.897	15.850	321.422
`p_hat700-3`	638	692	1.085	645	1.011	1.864	17.240	173.352
`san200_0.7_1`	170	194	1.141	184	1.082	1.897	0.000	22.830
`san200_0.7_2`	182	190	1.044	188	1.033	1.979	0.000	21.728
`san200_0.9_1`	130	176	1.354	155	1.192	1.761	0.000	11.270
`san200_0.9_2`	140	182	1.300	164	1.171	1.802	0.000	10.641
`san200_0.9_3`	156	190	1.218	171	1.096	1.800	0.000	0.000
`san400_0.5_1`	387	396	1.023	393	1.016	1.985	0.000	100.968
`san400_0.7_1`	360	394	1.094	379	1.053	1.924	0.000	72.765
`san400_0.7_2`	370	393	1.062	385	1.041	1.959	0.000	73.016
`san400_0.7_3`	378	396	1.048	388	1.026	1.960	0.000	75.971
`san400_0.9_1`	300	381	1.270	359	1.197	1.885	1.976	15.149
`sanr200_0.7`	182	197	1.082	186	1.022	1.888	0.000	28.249
`sanr200_0.9`	158	187	1.184	166	1.051	1.775	0.000	14.531
`sanr400_0.5`	387	399	1.031	389	1.005	1.950	0.000	99.775
`sanr400_0.7`	379	396	1.045	385	1.016	1.944	20.387	67.288

Summary over the 68 logged NPBench instances. Salvador's mean Salvador/Opt ratio is $1.030$ , with minimum $1.000$ and maximum $1.197$ . The mean NetworkX baseline ratio is $1.147$ . The mean certified Salvador upper bound $2|C_S|/|C_N|$ is $1.831$ , with maximum $1.986$ .

References

Bar-Yehuda, R., and Even, S. (1985). "A local-ratio theorem for approximating the weighted vertex cover problem." Annals of Discrete Mathematics, 25, 27-46. http://www.cs.technion.ac.il/~reuven/PDF/vc_lr.pdf
Baker, B. S. (1994). "Approximation algorithms for NP-complete problems on planar graphs." Journal of the ACM, 41(1), 153-180.
Mascia, F. (2015). The Maximum Clique Problem -- DIMACS Benchmark Set. https://iridia.ulb.ac.be/~fmascia/maximum_clique/DIMACS-benchmark
Nguyen, T. V., and Bui, T. NP-Complete Benchmark Instances. https://roars.dev/npbench/
NetworkX documentation for min_weighted_vertex_cover: https://networkx.org/documentation/stable/reference/algorithms/generated/networkx.algorithms.approximation.vertex_cover.min_weighted_vertex_cover.html
Salvador package on PyPI: https://pypi.org/project/salvador/

Conclusion

The current Salvador v0.0.2 implementation is a linear-time approximate vertex-cover solver built around a spanning-forest planar core, a weighted MIDS gadget, a linear IDS pass, one repair scan, and one pruning scan. It always returns a valid vertex cover, runs in $O(n+m)$ time, and on the logged NPBench experiment improves the NetworkX 2-approximation cover size on every instance while producing mean empirical ratio $1.030$ against the DIMACS reference optima.

DEV Community

The Salvador Algorithm

An Approximate Solution to the Minimum Vertex Cover Problem: The Salvador Algorithm

Abstract

Introduction

Current Python Implementation

Main Entry Point (`algorithm.py`)

Linear Planar Core and MIDS Reduction (`vc_reduction.py`)

Linear Weighted IDS Pass (`baker_ptas.py`)

Correctness

Approximation Ratio and Consequences for P vs NP

Runtime Analysis

Experimental Study

References

Conclusion

Top comments (0)

An Approximate Solution to the Minimum Vertex Cover Problem: The Salvador Algorithm

Abstract

Introduction

Current Python Implementation

Main Entry Point (algorithm.py)

Linear Planar Core and MIDS Reduction (vc_reduction.py)

Linear Weighted IDS Pass (baker_ptas.py)

Correctness

Approximation Ratio and Consequences for P vs NP

Runtime Analysis

Experimental Study

References

Conclusion

Main Entry Point (`algorithm.py`)

Linear Planar Core and MIDS Reduction (`vc_reduction.py`)

Linear Weighted IDS Pass (`baker_ptas.py`)