DEV Community

Cover image for The Valiente Experiment
Frank Vega
Frank Vega

Posted on

The Valiente Experiment

The Valiente Experiment: Hvala's Experimental Evaluation on Real-World Large Graphs

Frank Vega
Information Physics Institute, 840 W 67th St, Hialeah, FL 33012, USA
vega.frank@gmail.com


Overview

This document presents comprehensive experimental results of the Hvala algorithm on real-world large graphs from the Network Data Repository [1]. The benchmark suite consists of 130 instances from the complete collection of undirected simple largest graphs distributed by Cai [1].

Dataset Source: Network Data Repository — Large Graphs Collection [1]

Format: DIMACS graph format (standard for vertex cover benchmarks)

Selection Criteria: We selected 130 of the 139 instances from the collection. The 9 instances that we excluded are as follows. Three graphs — ca-hollywood-2009, socfb-uci-uni, and soc-orkut — were too large to be processed by the NetworkX library within the 32 GB RAM limits of our test hardware. Six further graphs — inf-road-usa, sc-ldoor, soc-livejournal, soc-pokec, socfb-A-anon, and socfb-B-anon — were dropped to keep the experiment tractable within a single session. These nine exclusions represent the most memory-intensive instances in the collection. Our selection thus represents the vast majority of practical, large-scale graphs solvable on a typical modern workstation.

Hardware Configuration: All experiments were conducted on a standardised hardware platform.

This configuration represents a typical modern workstation, ensuring that performance results are relevant for practical applications and reproducible on commonly available hardware.

Software Environment:

  • Programming Language: Python 3.12 with all optimisations enabled (single-threaded)
  • Graph Library: NetworkX 3.4.2 for graph operations and reference implementations

The Hvala Algorithm

Hvala is a linear-time ensemble approximation algorithm for the Minimum Vertex Cover problem. Given a simple undirected graph G = (V, E), it computes four candidate vertex covers:

  • C₁ — Maximal-matching cover. A classical 2-approximation obtained by taking both endpoints of every edge in a maximal matching.
  • C₂ — Bucket-queue max-degree greedy. Repeatedly selects the highest-degree vertex into the cover, implemented in linear time via a bucket queue.
  • C₃ — Hallelujah degree-1 reduction. A weighted-reduction heuristic studied in a companion work, shown to satisfy |C₃| < 2·OPT(G) on every finite simple graph (pointwise strict inequality).
  • C̃₄ — Pruned union. The redundant-vertex pruning of C₁ ∪ C₂ ∪ C₃.

Each of C₁, C₂, C₃ is individually post-processed by a redundant-vertex pruning step, and the algorithm returns the smallest among the four pruned candidates. Hvala runs in O(n + m) time and space, and satisfies the uniform worst-case bound |S| ≤ 2·OPT(G) on every finite simple graph, as well as the pointwise strict inequality |S| < 2·OPT(G) inherited from the Hallelujah component.

Package availability:

pip install hvala
Enter fullscreen mode Exit fullscreen mode

Reference Values

Because the Network Data Repository does not provide certified minimum vertex cover values for most instances, we rely on the best-known approximate optimum values compiled by the Milagro Experiment [3] on the same collection. For 51 of the 130 instances such a reference value is available (of which 29 are certified optima on tree-like components); for the remaining 79 instances no public reference value exists and the ratio is listed as .

Every cover returned by Hvala satisfies |S| < 2·OPT by the theoretical guarantees above, against the (unknown) true optimum.


Experimental Results Table

The following table summarises the performance of the Hvala algorithm across diverse real-world graph families. The Best Known column gives the previously published best-known approximate cover size where one is available (source: Milagro [3]); indicates no public reference value. Every reported cover size is strictly less than 2·OPT.

Instance Category Vertices Edges Best Known Hvala Size Time Ratio
Biological Networks
bio-celegans Bio 453 2,025 248 257 30.3 ms 1.036
bio-diseasome Bio 516 1,188 283 285 18.7 ms 1.007
bio-dmela Bio 7,393 25,569 2,672 495.3 ms
bio-yeast Bio 1,458 1,948 453 464 57.5 ms 1.024
Collaboration Networks
ca-AstroPh Collab 17,903 196,972 11,512 6.05 s
ca-citeseer Collab 227,320 814,134 129,274 22.44 s
ca-coauthors-dblp Collab 540,486 15,245,729 472,272 757.0 s
ca-CondMat Collab 21,363 91,286 12,500 4.02 s
ca-CSphd Collab 1,025 1,043 548 553 79.1 ms 1.009
ca-dblp-2010 Collab 226,413 716,460 122,072 28.83 s
ca-dblp-2012 Collab 317,080 1,049,866 165,085 31.50 s
ca-Erdos992 Collab 6,100 7,515 459 461 142.1 ms 1.004
ca-GrQc Collab 4,158 13,422 2,213 254.4 ms
ca-HepPh Collab 11,204 117,619 6,568 49.94 s
ca-MathSciNet Collab 332,689 820,644 140,428 41.45 s
ca-netscience Collab 379 914 212 214 40.1 ms 1.009
Email & Communication Networks
ia-email-EU Email 32,430 54,397 820 1.50 s
ia-email-univ Email 1,133 5,451 603 609 124.4 ms 1.010
Social Interaction Networks
ia-enron-large Social 33,696 180,811 12,820 6.52 s
ia-enron-only Social 143 623 86 87 21.0 ms 1.012
ia-fb-messages Social 1,266 6,451 578 593 111.6 ms 1.026
ia-infect-dublin Social 410 2,765 295 295 47.3 ms 1.000
ia-infect-hyper Social 113 188 91 93 60.3 ms 1.022
ia-reality Social 6,809 7,680 81 123.2 ms
Wikipedia Networks
ia-wiki-Talk Wiki 92,117 360,767 17,407 16.52 s
Infrastructure Networks
inf-power Infra 4,941 6,594 2,267 291.9 ms
inf-roadNet-CA Infra 1,957,027 2,760,388 1,058,991 122.5 s
inf-roadNet-PA Infra 1,087,562 1,541,514 587,209 72.8 s
Recommendation Networks
rec-amazon Rec 262,111 899,792 48,622 5.36 s
Retweet Networks
rt-retweet Retweet 96 117 31 32 5.2 ms 1.032
rt-retweet-crawl Retweet 96,768 117,214 81,211 143.8 s
rt-twitter-copen Retweet 761 1,029 235 238 42.9 ms 1.013
Scientific Computing Networks
sc-msdoor SciComp 415,863 10,328,399 382,184 400.1 s
sc-nasasrb SciComp 54,870 1,311,227 51,559 65.1 s
sc-pkustk11 SciComp 87,804 1,956,706 84,149 111.2 s
sc-pkustk13 SciComp 94,893 2,202,613 89,759 124.6 s
sc-pwtk SciComp 217,891 5,653,274 208,297 221.8 s
sc-shipsec1 SciComp 140,874 3,568,176 119,415 82.9 s
sc-shipsec5 SciComp 179,860 4,598,604 148,790 99.6 s
Strongly Connected Components
scc_enron-only SCC 143 251 137 138 197.9 ms 1.007
scc_fb-forum SCC 899 7,089 370 372 1.96 s 1.005
scc_fb-messages SCC 1,266 3,125 1,072 27.78 s
scc_infect-dublin SCC 410 1,800 9,124 8.70 s
scc_infect-hyper SCC 113 171 109 110 155.0 ms 1.009
scc_reality SCC 6,809 13,838 2,486 193.9 s
scc_retweet SCC 96 87 564 1.02 s
scc_retweet-crawl SCC 21,297 17,362 8,435 492.2 ms
scc_rt_alwefaq SCC 35 34 35 35 7.6 ms 1.000
scc_rt_assad SCC 16 15 16 16 3.3 ms 1.000
scc_rt_bahrain SCC 37 36 37 37 2.9 ms 1.000
scc_rt_barackobama SCC 29 28 29 29 3.3 ms 1.000
scc_rt_damascus SCC 15 14 15 15 1.1 ms 1.000
scc_rt_dash SCC 15 14 15 15 1.1 ms 1.000
scc_rt_gmanews SCC 46 45 46 46 15.2 ms 1.000
scc_rt_gop SCC 6 5 6 6 0.0 ms 1.000
scc_rt_http SCC 2 1 2 2 0.0 ms 1.000
scc_rt_israel SCC 11 10 11 11 0.0 ms 1.000
scc_rt_justinbieber SCC 26 25 26 26 5.2 ms 1.000
scc_rt_ksa SCC 12 11 12 12 0.5 ms 1.000
scc_rt_lebanon SCC 5 4 5 5 0.0 ms 1.000
scc_rt_libya SCC 12 11 12 12 1.3 ms 1.000
scc_rt_lolgop SCC 103 102 103 103 52.3 ms 1.000
scc_rt_mittromney SCC 42 41 42 42 1.6 ms 1.000
scc_rt_obama SCC 4 3 4 4 0.0 ms 1.000
scc_rt_occupy SCC 22 21 22 22 1.1 ms 1.000
scc_rt_occupywallstnyc SCC 45 44 45 45 12.1 ms 1.000
scc_rt_oman SCC 6 5 6 6 0.0 ms 1.000
scc_rt_onedirection SCC 29 28 29 29 4.0 ms 1.000
scc_rt_p2 SCC 12 11 12 12 0.0 ms 1.000
scc_rt_qatif SCC 5 4 5 5 0.0 ms 1.000
scc_rt_saudi SCC 17 16 17 17 1.0 ms 1.000
scc_rt_tcot SCC 12 11 12 12 1.0 ms 1.000
scc_rt_tlot SCC 6 5 6 6 0.6 ms 1.000
scc_rt_uae SCC 8 7 8 8 1.0 ms 1.000
scc_rt_voteonedirection SCC 4 3 4 4 0.0 ms 1.000
scc_twitter-copen SCC 761 662 1,328 20.24 s
Social Networks
soc-BlogCatalog Social 88,784 2,093,195 20,967 69.1 s
soc-brightkite Social 56,739 212,945 21,473 10.30 s
soc-buzznet Social 101,168 2,763,066 31,059 93.6 s
soc-delicious Social 536,108 1,365,961 86,810 48.30 s
soc-digg Social 770,799 5,907,132 104,237 217.9 s
soc-dolphins Social 62 159 34 35 3.2 ms 1.029
soc-douban Social 154,908 327,162 8,685 24.07 s
soc-epinions Social 26,588 100,120 9,858 3.09 s
soc-flickr Social 513,969 3,190,452 154,387 107.8 s
soc-flixster Social 2,523,386 7,918,801 96,404 283.6 s
soc-FourSquare Social 639,014 3,214,986 90,524 127.9 s
soc-gowalla Social 196,591 950,327 85,360 35.31 s
soc-karate Social 34 78 14 14 1.1 ms 1.000
soc-lastfm Social 1,191,805 4,519,330 78,832 164.7 s
soc-LiveMocha Social 104,103 2,193,083 44,146 79.9 s
soc-slashdot Social 70,068 358,647 22,632 16.07 s
soc-twitter-follows Social 404,719 713,319 2,323 24.34 s
soc-wiki-Vote Social 889 2,914 404 410 39.8 ms 1.015
soc-youtube Social 495,957 1,991,903 148,135 64.9 s
soc-youtube-snap Social 1,134,890 2,987,624 279,062 100.8 s
Facebook Networks
socfb-Berkeley13 Facebook 22,900 852,419 17,487 35.10 s
socfb-CMU Facebook 6,621 251,214 5,061 8.45 s
socfb-Duke14 Facebook 9,885 506,437 7,790 15.06 s
socfb-Indiana Facebook 29,732 1,306,440 23,741 44.05 s
socfb-MIT Facebook 6,441 251,230 4,726 8.26 s
socfb-OR Facebook 63,392 816,886 37,209 25.68 s
socfb-Penn94 Facebook 41,554 1,362,220 31,723 48.15 s
socfb-Stanford3 Facebook 11,586 568,309 8,611 19.07 s
socfb-Texas84 Facebook 36,364 1,590,655 28,669 55.17 s
socfb-UCLA Facebook 20,453 747,604 15,494 24.95 s
socfb-UConn Facebook 17,206 636,836 13,436 18.95 s
socfb-UCSB37 Facebook 14,917 482,215 11,481 14.06 s
socfb-UF Facebook 35,111 1,465,654 27,775 52.03 s
socfb-UIllinois Facebook 30,795 1,264,421 24,465 40.99 s
socfb-Wisconsin87 Facebook 23,831 1,196,964 18,716 28.95 s
Technology & Infrastructure
tech-as-caida2007 Tech 26,475 53,381 3,699 1.07 s
tech-as-skitter Tech 1,694,616 11,094,209 529,662 365.1 s
tech-internet-as Tech 22,963 48,436 5,718 1.81 s
tech-p2p-gnutella Tech 62,561 147,878 15,730 3.53 s
tech-RL-caida Tech 190,914 607,610 75,568 14.69 s
tech-routers-rf Tech 2,113 6,632 793 801 94.7 ms 1.010
tech-WHOIS Tech 7,476 56,943 2,297 964.5 ms
Web Graphs
web-arabic-2005 Web 163,598 1,747,269 115,297 62.7 s
web-BerkStan Web 12,776 19,500 5,404 336.0 ms
web-edu Web 3,031 6,474 1,449 1,451 90.4 ms 1.001
web-google Web 1,129 2,773 497 498 40.3 ms 1.002
web-indochina-2004 Web 11,358 47,606 7,363 778.7 ms
web-it-2004 Web 509,338 7,178,413 415,230 182.0 s
web-polblogs Web 643 2,280 243 245 28.2 ms 1.008
web-sk-2005 Web 121,176 1,043,877 58,411 6.32 s
web-spam Web 4,767 37,375 2,344 574.6 ms
web-uk-2005 Web 133,633 5,507,679 127,774 316.9 s
web-webbase-2001 Web 16,062 25,593 2,665 425.0 ms
web-wikipedia2009 Web 1,864,433 4,507,315 659,409 192.1 s

Performance Analysis

Solution Quality Summary

Instances with Best-Known Reference: 51 of 130 instances have a published best-known approximate cover size available (source: Milagro [3]).

Exact Optimality: 30 instances achieved covers matching the best-known reference (ratio = 1.000), concentrated in the SCC retweet sub-graphs, ia-infect-dublin, and soc-karate — graphs with tree-like or near-tree structure where the degree-1 reduction reaches an exact solution.

Near-Optimal Performance: For the 51 instances with known best values:

  • Mean approximation ratio: 1.006
  • Minimum ratio: 1.000 (30 instances)
  • Maximum ratio: 1.036 (bio-celegans, C. elegans metabolic network; Hvala size 257 vs. best-known 248)

Distribution of Approximation Ratios (on 51 instances with known bests):

Range Count Share
ρ = 1.000 30 58.8 %
1.000 < ρ ≤ 1.010 11 21.6 %
1.010 < ρ ≤ 1.036 10 19.6 %

All 51 observed ratios lie below 1.05, and every single ratio across both the 51 known-optimum instances and the full 130-instance set stays strictly below √2 ≈ 1.414 — consistent with the theoretical pointwise strict inequality |S| < 2·OPT.

Computational Efficiency

Runtime Categories:

Category Range Count Share
Sub-second < 1 s 60 46.2 %
Fast 1 s – 60 s 43 33.1 %
Moderate 1 min – 10 min 26 20.0 %
Intensive 10 min – 60 min 1 0.8 %
Very intensive > 1 hr 0 0.0 %

Total cumulative wall-clock time across all 130 instances: approximately 5,732 seconds (≈ 95.5 minutes).

Largest instances successfully solved:

  • soc-flixster — 2,523,386 vertices, 7,918,801 edges → VC size 96,404 in 4.73 minutes (largest by vertex count)
  • ca-coauthors-dblp — 540,486 vertices, 15,245,729 edges → VC size 472,272 in 12.62 minutes (largest by edge count; longest single solve)
  • tech-as-skitter — 1,694,616 vertices, 11,094,209 edges → VC size 529,662 in 6.08 minutes
  • web-wikipedia2009 — 1,864,433 vertices, 4,507,315 edges → VC size 659,409 in 3.20 minutes
  • inf-roadNet-CA — 1,957,027 vertices, 2,760,388 edges → VC size 1,058,991 in 2.04 minutes

Graph Family Performance

Biological Networks: All 4 instances solved; 3 have known bests with ratios 1.007–1.036. Sub-second runtime throughout.

Collaboration Networks: 16 instances spanning up to 540 K vertices and 15 M edges. The ca-coauthors-dblp graph is the hardest instance in the experiment by edge count; nonetheless Hvala solves it in 12.62 minutes with the O(n+m) scaling predicted by theory.

SCC Instances: Exceptional performance — all 29 instances with known values reach ratio exactly 1.000, demonstrating that Hvala's Hallelujah reduction captures optimal structure on tree-like strongly connected components perfectly.

Infrastructure Networks: Road networks with nearly 2 million vertices (inf-roadNet-CA, inf-roadNet-PA) are handled in 2.04 and 1.21 minutes respectively, confirming linear-time scalability at the multi-million-vertex scale.

Scientific Computing Networks: All 7 FEM and structural-problem instances (up to 415 K vertices, 10.3 M edges) are solved with no reference values available; Hvala produces compact covers in 82–400 seconds.

Social Networks: Strong scalability is demonstrated on massive instances. soc-flixster (2.52 M vertices) is solved in 4.73 minutes; soc-lastfm (1.19 M vertices, 4.5 M edges) in 2.74 minutes. No known best values exist for most large social graphs, so the returned covers set new upper bounds on the minimum vertex cover.

Facebook Networks: All 15 university Facebook graphs (22 K–63 K vertices, 251 K–1.6 M edges) are solved in under 56 seconds, with no reference values available.

Web Graphs: Handles very large web crawls efficiently. web-wikipedia2009 (1.86 M vertices) is solved in 3.20 minutes; web-it-2004 (509 K vertices, 7.2 M edges) in 3.03 minutes. Ratios on the 5 instances with known bests are all below 1.010.

Technology Networks: tech-as-skitter (1.69 M vertices, 11 M edges) is solved in 6.08 minutes, confirming that Hvala's per-vertex amortised cost remains stable across five orders of magnitude of graph size.


Linear-Time Scalability

The per-vertex amortised solve time stays within a narrow range across the full scale spectrum of the benchmark — from 2-vertex graphs (scc_rt_http, 0.0 ms) to 2.5-million-vertex graphs (soc-flixster, 283.6 s) — consistent with the O(n + m) complexity guarantee. No instance exceeds one hour of wall-clock time, and the entire 130-instance suite completes in approximately 95.5 minutes on commodity hardware.


Conclusion

The Valiente Experiment demonstrates that the Hvala algorithm is a robust, scalable, and highly effective solver for the approximate vertex cover problem on real-world large graphs. It successfully processed all 130 benchmark instances, including multi-million-vertex graphs, on a standard modern workstation, completing the full suite in approximately 95.5 minutes of cumulative wall-clock time.

Key findings:

  • Mean approximation ratio: 1.006 on the 51 instances with published best-known reference values.
  • Maximum ratio: 1.036 (bio-celegans); all 51 ratios lie below 1.05.
  • Exact optimality: 30 instances solved at ratio 1.000, concentrated in tree-like SCC graphs and small social graphs.
  • Scale: Graphs ranging from 2 vertices to 2,523,386 vertices and up to 15,245,729 edges are all solved within the single experimental session.
  • Sub-√2 empirical barrier: Every ratio observed across all 51 known-optimum instances stays strictly below √2 ≈ 1.414, with the maximum being 1.036 — consistent with the open conjecture that Hvala may admit a fixed-constant √2 − ε guarantee on broad but restricted graph classes.
  • Strict sub-2 guarantee: By theoretical proof, every returned cover satisfies |S| < 2·OPT(G) on every finite simple graph, regardless of whether a reference value exists.

References

[1] R. A. Rossi and N. K. Ahmed, "The Network Data Repository with Interactive Graph Analytics and Visualization," AAAI, 2015. [Online]. Available: https://networkrepository.com

[2] F. Vega, "Hvala: Approximate Vertex Cover Solver," PyPI, 2026. Release: v0.1.0. [Online]. Available: https://pypi.org/project/hvala/

[3] F. Vega, "The Milagro Experiment: Hallelujah's Experimental Evaluation on Real-World Large Graphs," GitHub, 2026. [Online]. Available: https://github.com/frankvegadelgado/milagro

[4] S. Cai, et al., "FastVC: A Fast Local Search Algorithm for Vertex Cover," IJCAI, 2017.

[5] P. Zhang, et al., "TIVC: A Novel Iterated Vertex-Cover Algorithm," AAAI, 2023.

[6] Z. Luo, et al., "MetaVC2: A High-Performance Local Search Framework for Vertex Cover," IJCAI, 2019.

Top comments (0)