DEV Community

Cover image for The Salvador Algorithm
Frank Vega
Frank Vega

Posted on • Edited on

The Salvador Algorithm

An Approximate Solution to the Minimum Vertex Cover Problem: The Salvador Algorithm

Frank Vega

Information Physics Institute, 840 W 67th St, Hialeah, FL 33012, USA

vega.frank@gmail.com


Abstract

The Minimum Vertex Cover problem is NP-hard. The best known polynomial-time approximation ratio is 22 , matching the Unique Games Conjecture lower bound of 2ε2-\varepsilon ; without that conjecture, 2-to-2 games hardness gives inapproximability within 2ε\sqrt{2}-\varepsilon under PNP\mathrm{P}\neq\mathrm{NP} . We present Salvador v0.0.2, an approximate vertex-cover solver whose current implementation builds a linear-size spanning-forest planar core, reduces it to a weighted Minimum Independent Dominating Set gadget, solves that gadget by a linear weighted independent-dominating-set pass, repairs all non-core edges, and prunes redundant vertices. The algorithm runs in O(n+m)O(n+m) time and always returns a valid vertex cover. On 68 NPBench clique-complement instances, Salvador's empirical ratio ranges from 1.0001.000 to 1.1971.197 with mean 1.0301.030 ; using the NetworkX 2-approximation baseline gives an a posteriori certified upper bound no larger than 1.9861.986 . We conjecture that the approximation ratio is universally bounded by 1.41.4 . Since 1.4=2ε01.4=\sqrt{2}-\varepsilon_0 for ε0=2750.0142\varepsilon_0=\sqrt{2}-\tfrac{7}{5}\approx0.0142 , proving the conjecture would place the algorithm strictly below the 2\sqrt{2} inapproximability threshold.


Introduction

Given an undirected graph G=(V,E)G=(V,E) , a set CVC\subseteq V is a vertex cover if every edge has at least one endpoint in CC . The goal is to minimize C|C| ; the optimum is denoted τ(G)\tau(G) .

The current Salvador implementation differs from earlier drafts that used repeated planarity tests and maximal-planar-subgraph extraction. The public routine find_vertex_cover now runs through four linear phases:

  1. Preprocessing. Remove self-loops and isolated vertices.
  2. Linear planar core. Build a spanning forest GpG_p with Union-Find in one pass over EE . The rejected cycle-closing edges are stored for repair.
  3. Weighted MIDS gadget and linear IDS pass. Build a planar weighted MIDS gadget on GpG_p , solve it by a linear cheap-first maximal independent set pass, decode candidate cover vertices, and repair residual core edges.
  4. Original-edge repair and pruning. Scan all removed edges once, add the higher-degree endpoint if still uncovered, and prune redundant cover vertices in a single adjacency pass.

The main contributions are:

  • A practical O(n+m)O(n+m) vertex-cover algorithm based on a linear spanning-forest planar core and one repair scan.
  • A weighted MIDS gadget for the planar core, preserving the original vertex-cover encoding while avoiding repeated planarity augmentation.
  • A current-code experimental study on 68 NPBench clique-complement instances with empirical ratios in [1.000,1.197][1.000,\,1.197] and mean 1.0301.030 .
  • An a posteriori certified upper-bound column computed from the NetworkX 2-approximation baseline.
  • An open-source implementation, Salvador v0.0.2.

Current Python Implementation

Main Entry Point (algorithm.py)

import itertools
from . import utils

import networkx as nx
from . import vc_reduction


def prune_redundant_vertices(adj, C):
    """
    Linear-time single-pass removal of redundant vertices.
    For every v in C we check whether all its neighbors are still in the
    current cover. If yes, we safely remove it immediately.
    """
    C = set(C)
    for v in list(C):
        all_neighbors_covered = True
        for u in adj.get(v, []):
            if u not in C:
                all_neighbors_covered = False
                break
        if all_neighbors_covered:
            C.remove(v)
    return C


def find_vertex_cover(graph, epsilon: float = 0.1):
    G = graph.copy()
    G.remove_edges_from(nx.selfloop_edges(G))
    G.remove_nodes_from(list(nx.isolates(G)))

    if G.number_of_edges() == 0:
        return set()

    # Linear forest-core MIDS reduction with one repair scan.
    cover, _ = vc_reduction.solve_vc(G, epsilon)

    adj = {v: set(G[v]) for v in G}
    return prune_redundant_vertices(adj, cover)
Enter fullscreen mode Exit fullscreen mode

Linear Planar Core and MIDS Reduction (vc_reduction.py)

For a core graph GpG_p with NN vertices, the gadget has a penalty P=N+1P=N+1 . For every vertex vv , it adds (v,0)(v,0) of weight 11 and (v,1)(v,1) of weight 00 ; for every core edge (u,v)(u,v) , it adds a hub huvh_{uv} of weight PP adjacent to (u,0)(u,0) and (v,0)(v,0) .

weighted MIDS cost=cover+(N+1)uncovered core edges. \text{weighted MIDS cost} = |\text{cover}| + (N+1)\cdot |\text{uncovered core edges}|.

def reduce_vc_to_mids(G: nx.Graph, epsilon: float = 0.1,
                      assume_planar: bool = False):
    if not assume_planar and not nx.is_planar(G):
        raise ValueError("reduce_vc_to_mids requires a planar graph.")

    N = G.number_of_nodes()
    P = N + 1

    H = nx.Graph(); weights = {}
    for v in G.nodes():
        H.add_edge((v, 0), (v, 1))
        weights[(v, 0)] = 1
        weights[(v, 1)] = 0
    for u, v in G.edges():
        hub = ('h', u, v); weights[hub] = P
        H.add_edge((u, 0), hub); H.add_edge((v, 0), hub)

    return H, weights, P


def _spanning_forest_planar_core(G: nx.Graph):
    H = nx.Graph()
    H.add_nodes_from(G.nodes())
    parent = {v: v for v in G.nodes()}
    rank = {v: 0 for v in G.nodes()}

    def find(x):
        while parent[x] != x:
            parent[x] = parent[parent[x]]
            x = parent[x]
        return x

    def union(u, v):
        ru, rv = find(u), find(v)
        if ru == rv:
            return False
        if rank[ru] < rank[rv]:
            ru, rv = rv, ru
        parent[rv] = ru
        if rank[ru] == rank[rv]:
            rank[ru] += 1
        return True

    removed = []
    for u, v in G.edges():
        if union(u, v):
            H.add_edge(u, v)
        else:
            removed.append((u, v))
    return H, removed


def solve_vc(G: nx.Graph, epsilon: float = 0.1):
    if G.number_of_edges() == 0:
        return frozenset(), 0.0

    G_p, removed = _spanning_forest_planar_core(G)
    cover = _solve_planar(G_p, G, epsilon)

    for u, v in removed:
        if u not in cover and v not in cover:
            cover.add(u if G.degree(u) >= G.degree(v) else v)

    return frozenset(cover), float(len(cover))
Enter fullscreen mode Exit fullscreen mode

Linear Weighted IDS Pass (baker_ptas.py)

The current default weighted IDS solver is a linear cheap-first maximal independent set pass. A maximal independent set is also an independent dominating set, and the Salvador gadget uses only the weight classes 00 , 11 , and PP , so bucket ordering avoids sorting.

def greedy_maximal_is_weighted(adj, vertices, weights):
    zeros, ones, heavy = [], [], []
    for v in vertices:
        w = weights.get(v, 1)
        if w == 0:
            zeros.append(v)
        elif w == 1:
            ones.append(v)
        else:
            heavy.append(v)
    vlist = zeros + ones + heavy

    selected = set(); excluded = set()
    for v in vlist:
        if v not in excluded:
            selected.add(v); excluded.add(v)
            excluded.update(adj.get(v, ()))
    return frozenset(selected)


def baker_ptas_ids_weighted(adj, weights=None, epsilon=0.5):
    vertices = set(adj.keys())
    if not vertices:
        return frozenset()
    if len(vertices) == 1:
        return frozenset(vertices)
    if weights is None:
        weights = {}

    return greedy_maximal_is_weighted(adj, vertices, weights)
Enter fullscreen mode Exit fullscreen mode

The older tree-decomposition Baker PTAS code remains in the module after this early return for experimentation and backward comparison, but it is not the default path used by find_vertex_cover.


Correctness

Theorem 1 (Validity). find_vertex_cover always returns a valid vertex cover of the input graph GG .

Proof sketch. Preprocessing removes only self-loops and isolated vertices. The spanning forest GpG_p contains all vertices and a subset of original edges. The MIDS decode plus residual core repair covers every edge in GpG_p . Every removed edge (u,v)EEp(u,v)\in E\setminus E_p is then scanned: if neither endpoint is already in the cover, one endpoint is added. Therefore all original edges are covered. Finally, pruning removes a vertex vv only when every neighbor of vv remains in the cover, so every edge incident to vv stays covered by the other endpoint. ?


Approximation Ratio and Consequences for P vs NP

Let τ(G)\tau(G) be the optimum vertex-cover size. The present implementation has a direct validity guarantee and an empirical approximation profile. For each NPBench clique-complement graph we compute the reference optimum as τ=nω\tau^*=n-\omega , where ω\omega is the DIMACS maximum-clique value for the original graph.

The central conjecture remains:

Conjecture 1 (Universal 1.41.4 -approximation). For every undirected graph GG , find_vertex_cover returns a cover CC satisfying

C75τ(G)=(2ε0)τ(G),ε0=275>0. |C| \le \frac{7}{5}\,\tau(G) = (\sqrt{2}-\varepsilon_0)\,\tau(G),\qquad \varepsilon_0=\sqrt{2}-\tfrac{7}{5}>0.

Since 1.4<21.4<\sqrt{2} , a proof of Conjecture 1 would give a polynomial-time approximation ratio strictly below the 2-to-2 games hardness threshold and would imply P=NP\mathrm{P}=\mathrm{NP} .

Across the current NPBench run, the largest displayed Salvador/reference ratio is 1.1971.197 , below the conjectured 1.41.4 threshold and below 21.414\sqrt{2}\approx1.414 .


Runtime Analysis

Theorem 2 (Time complexity). The current implementation of find_vertex_cover runs in O(n+m)O(n+m) time, where n=Vn=|V| and m=Em=|E| .

Proof sketch.

  • Preprocessing scans vertices and edges: O(n+m)O(n+m) .
  • _spanning_forest_planar_core performs one Union-Find pass over EE : O(n+m)O(n+m) , absorbing the inverse-Ackermann factor into the linear bound.
  • The gadget is built on a forest, so it has O(n)O(n) vertices and edges.
  • The weighted IDS pass buckets vertices by weight and scans adjacency once through the maximal independent set construction: O(n)O(n) on the forest gadget.
  • Repair scans each removed edge once: O(m)O(m) .
  • Pruning scans adjacency around candidate cover vertices: O(n+m)O(n+m) .

Therefore the total running time is O(n+m)O(n+m) .


Experimental Study

We evaluated Salvador v0.0.2 on the NPBench vertex-cover clique-complement collection. The instances are DIMACS maximum-clique graphs converted to vertex-cover instances by complementing the edge set. A clique of size ω\omega in the original graph becomes an independent set of size ω\omega in the complement, so the vertex-cover reference optimum is:

τ=nω. \tau^* = n - \omega.

The logged run is stored as experiments/np_bench.txt and was produced with:

python -m salvador.batch -i ./experiments/ -c -a -l
Enter fullscreen mode Exit fullscreen mode

or equivalently:

batch_vega -i ./experiments/ -c -a -l
Enter fullscreen mode Exit fullscreen mode

The baseline is NetworkX min_weighted_vertex_cover, a Bar-Yehuda--Even local-ratio 2-approximation. We report both empirical ratios against the DIMACS reference optimum and the certified a posteriori Salvador bound:

CSτ2CSCN, \frac{|C_S|}{\tau^*} \le \frac{2|C_S|}{|C_N|},

where CSC_S is Salvador's cover and CNC_N is NetworkX's 2-approximation cover.

For example, on brock200_2.clq-compliment.txt, NetworkX computes a cover of size 199199 in 15.69015.690 ms, Salvador computes a cover of size 192192 in 21.45821.458 ms, and the certified upper bound is:

2192/199=1.929648. 2\cdot 192/199 = 1.929648.

Rows marked dagger use a best-known clique lower bound, so the displayed vertex-cover optimum is an upper bound on the true optimum.

Instance Opt VC NetworkX VC NetworkX/Opt Salvador VC Salvador/Opt Certified upper bound NetworkX ms Salvador ms
brock200_1 179 197 1.101 184 1.028 1.868 0.000 21.115
brock200_2 188 199 1.059 192 1.021 1.930 15.690 21.458
brock200_3 185 197 1.065 190 1.027 1.929 0.998 29.073
brock200_4 183 198 1.082 187 1.022 1.889 3.405 37.961
brock400_1 373 397 1.064 382 1.024 1.924 0.000 66.091
brock400_2 371 396 1.067 381 1.027 1.924 3.232 56.858
brock400_3 369 396 1.073 381 1.033 1.924 0.000 69.879
brock400_4 367 398 1.084 383 1.044 1.925 15.741 69.259
brock800_1 777 798 1.027 785 1.010 1.967 17.816 390.844
brock800_2 776 798 1.028 785 1.012 1.967 10.028 306.896
brock800_3 775 796 1.027 785 1.013 1.972 11.772 303.142
brock800_4 774 798 1.031 784 1.013 1.965 15.647 309.083
c-fat200-1 188 198 1.053 188 1.000 1.899 2.913 43.041
c-fat200-2 176 198 1.125 176 1.000 1.778 1.978 38.133
c-fat200-5 142 199 1.401 142 1.000 1.427 2.356 27.579
c-fat500-1 486 498 1.025 486 1.000 1.952 11.976 274.073
c-fat500-10 374 499 1.334 374 1.000 1.499 0.000 198.928
c-fat500-2 474 498 1.051 474 1.000 1.904 15.514 362.902
c-fat500-5 436 499 1.144 436 1.000 1.747 5.511 255.697
C1000.9 <= 932 dagger 987 1.059 953 1.023 1.931 0.000 150.017
C125.9 91 115 1.264 94 1.033 1.635 0.000 6.151
C250.9 206 241 1.170 214 1.039 1.776 0.998 4.281
C500.9 <= 443 dagger 489 1.104 457 1.032 1.869 0.000 46.692
gen200_p0.9_44 156 192 1.231 167 1.071 1.740 0.000 15.628
gen200_p0.9_55 145 187 1.290 165 1.138 1.765 0.000 0.000
gen400_p0.9_55 345 388 1.125 359 1.041 1.851 3.732 19.340
gen400_p0.9_65 335 387 1.155 361 1.078 1.866 2.246 26.162
gen400_p0.9_75 325 385 1.185 356 1.095 1.849 3.169 23.254
hamming10-2 512 1023 1.998 512 1.000 1.001 1.073 46.221
hamming10-4 984 1023 1.040 996 1.012 1.947 11.849 258.114
hamming6-2 32 63 1.969 32 1.000 1.016 0.000 1.006
hamming6-4 60 63 1.050 62 1.033 1.968 0.000 2.023
hamming8-2 128 255 1.992 128 1.000 1.004 1.006 8.173
hamming8-4 240 255 1.062 240 1.000 1.882 2.248 28.899
johnson16-2-4 112 119 1.062 112 1.000 1.882 1.128 4.361
johnson32-2-4 480 495 1.031 480 1.000 1.939 0.000 49.459
johnson8-2-4 24 27 1.125 24 1.000 1.778 0.000 0.000
johnson8-4-4 56 69 1.232 56 1.000 1.623 0.999 5.097
keller4 160 170 1.062 162 1.012 1.906 1.743 59.447
keller5 749 775 1.035 758 1.012 1.956 16.164 183.929
MANN_a27 252 270 1.071 253 1.004 1.874 0.000 8.712
MANN_a45 690 735 1.065 695 1.007 1.891 0.986 23.164
MANN_a81 2221 2268 1.021 2225 1.002 1.962 3.769 79.834
MANN_a9 29 36 1.241 29 1.000 1.611 0.000 0.000
p_hat1000-3 <= 932 dagger 991 1.063 943 1.012 1.903 19.721 351.907
p_hat300-1 292 296 1.014 294 1.007 1.986 0.000 83.977
p_hat300-2 275 296 1.076 278 1.011 1.878 0.000 64.064
p_hat300-3 264 293 1.110 269 1.019 1.836 2.261 31.998
p_hat500-1 491 499 1.016 494 1.006 1.980 16.023 222.256
p_hat500-2 464 495 1.067 471 1.015 1.903 0.000 167.353
p_hat500-3 <= 450 dagger 489 1.087 459 1.020 1.877 5.095 89.239
p_hat700-1 689 699 1.015 693 1.006 1.983 31.294 453.505
p_hat700-2 <= 656 dagger 698 1.064 662 1.009 1.897 15.850 321.422
p_hat700-3 638 692 1.085 645 1.011 1.864 17.240 173.352
san200_0.7_1 170 194 1.141 184 1.082 1.897 0.000 22.830
san200_0.7_2 182 190 1.044 188 1.033 1.979 0.000 21.728
san200_0.9_1 130 176 1.354 155 1.192 1.761 0.000 11.270
san200_0.9_2 140 182 1.300 164 1.171 1.802 0.000 10.641
san200_0.9_3 156 190 1.218 171 1.096 1.800 0.000 0.000
san400_0.5_1 387 396 1.023 393 1.016 1.985 0.000 100.968
san400_0.7_1 360 394 1.094 379 1.053 1.924 0.000 72.765
san400_0.7_2 370 393 1.062 385 1.041 1.959 0.000 73.016
san400_0.7_3 378 396 1.048 388 1.026 1.960 0.000 75.971
san400_0.9_1 300 381 1.270 359 1.197 1.885 1.976 15.149
sanr200_0.7 182 197 1.082 186 1.022 1.888 0.000 28.249
sanr200_0.9 158 187 1.184 166 1.051 1.775 0.000 14.531
sanr400_0.5 387 399 1.031 389 1.005 1.950 0.000 99.775
sanr400_0.7 379 396 1.045 385 1.016 1.944 20.387 67.288

Summary over the 68 logged NPBench instances. Salvador's mean Salvador/Opt ratio is 1.0301.030 , with minimum 1.0001.000 and maximum 1.1971.197 . The mean NetworkX baseline ratio is 1.1471.147 . The mean certified Salvador upper bound 2CS/CN2|C_S|/|C_N| is 1.8311.831 , with maximum 1.9861.986 .


References


Conclusion

The current Salvador v0.0.2 implementation is a linear-time approximate vertex-cover solver built around a spanning-forest planar core, a weighted MIDS gadget, a linear IDS pass, one repair scan, and one pruning scan. It always returns a valid vertex cover, runs in O(n+m)O(n+m) time, and on the logged NPBench experiment improves the NetworkX 2-approximation cover size on every instance while producing mean empirical ratio 1.0301.030 against the DIMACS reference optima.

Top comments (0)