DEV Community

Cover image for The Milagro Experiment
Frank Vega
Frank Vega

Posted on

The Milagro Experiment

The Milagro Experiment: Hallelujah's Experimental Evaluation on Real-World Large Graphs

Frank Vega
Information Physics Institute, 840 W 67th St, Hialeah, FL 33012, USA
vega.frank@gmail.com

Overview

This section presents comprehensive experimental results of the Hallelujah algorithm on real-world large graphs from the Network Data Repository [1]. The benchmark suite consists of 136 instances from the complete collection of undirected simple largest graphs.

Dataset Source: Network Data Repository - Large Graphs Collection [1]

Format: DIMACS graph format (standard for vertex cover benchmarks)

Selection Criteria: We selected 136 of the 139 instances from the collection. The 3 instances that we excluded are ca-hollywood-2009, socfb-uci-uni, and soc-orkut. These graphs were too large to be processed by the NetworkX library within the 32GB RAM limits of our test hardware. Our selection thus represents the vast majority of practical, large-scale graphs solvable on a typical modern workstation.

Hardware Configuration: All experiments were conducted on a standardized hardware platform.

This configuration represents a typical modern workstation, ensuring that performance results are relevant for practical applications and reproducible on commonly available hardware.

Software Environment:

  • Programming Language: Python 3.12.0 with all optimizations enabled
  • Graph Library: NetworkX 3.4.2 for graph operations and reference implementations

Experimental Results Table

The following table summarizes the performance of the Hallelujah algorithm across diverse real-world graph families, including biological networks, collaboration networks, social networks, infrastructure networks, web graphs, and scientific computing networks.

Instance Category Vertices Edges VC Size Time Best Known Ratio Source/Notes
Biological Networks
bio-celegans Bio 453 2,025 284 31.21 ms ~248 ~1.145 C. elegans metabolic network
bio-diseasome Bio 516 1,188 287 15.63 ms ~283 ~1.014 Disease-gene associations
bio-dmela Bio 7,393 25,569 2,965 391.86 ms Unknown - Drosophila melanogaster
bio-yeast Bio 1,458 1,948 496 32.97 ms ~453 ~1.095 Yeast protein interactions
Collaboration Networks
ca-AstroPh Collab 17,903 196,972 11,757 3.77 s Unknown - Astrophysics collaborations
ca-citeseer Collab 227,320 814,134 130,569 18.46 s Unknown - CiteSeer citation network
ca-coauthors-dblp Collab 540,486 15,245,729 473,846 343.22 s Unknown - DBLP coauthorship (large)
ca-CondMat Collab 21,363 91,286 12,741 1.44 s Unknown - Condensed matter collaborations
ca-CSphd Collab 1,025 1,043 587 31.26 ms ~548 ~1.071 Computer science PhD
ca-dblp-2010 Collab 226,413 716,460 123,780 45.63 s Unknown - DBLP 2010 snapshot
ca-dblp-2012 Collab 317,080 1,049,866 167,568 28.82 s Unknown - DBLP 2012 snapshot
ca-Erdos992 Collab 6,100 7,515 475 109.44 ms ~459 ~1.035 Erdős collaboration graph
ca-GrQc Collab 4,158 13,422 2,264 171.87 ms Unknown - General relativity collaborations
ca-HepPh Collab 11,204 117,619 6,713 1.53 s Unknown - High-energy physics collaborations
ca-MathSciNet Collab 332,689 820,644 145,126 23.31 s Unknown - Mathematical reviews
ca-netscience Collab 379 914 214 15.57 ms ~212 ~1.009 Network science collaborations
Email & Communication Networks
ia-email-EU Email 32,430 54,397 894 859.29 ms Unknown - EU research institution email
ia-email-univ Email 1,133 5,451 668 62.49 ms ~603 ~1.108 University email network
ia-enron-large Email 33,696 180,811 13,210 5.85 s Unknown - Enron email (large)
ia-enron-only Email 143 623 99 15.62 ms ~86 ~1.151 Enron email (core)
Social Media Networks
ia-fb-messages Social 1,266 6,451 644 78.10 ms ~578 ~1.114 Facebook messages
ia-infect-dublin Social 410 2,765 318 31.24 ms ~296 ~1.074 Infection contact (Dublin)
ia-infect-hyper Social 113 188 95 31.24 ms ~91 ~1.044 Infection contact (hypertext)
ia-reality Social 6,809 7,680 85 124.99 ms Unknown - Reality mining
Wikipedia & Wiki Networks
ia-wiki-Talk Wiki 92,117 360,767 18,544 8.63 s Unknown - Wikipedia talk network
Infrastructure Networks
inf-power Infra 4,941 6,594 2,508 109.37 ms Unknown - US power grid
inf-road-usa Infra 23,947,347 28,854,312 13,989,792 4266.79 s Unknown - USA road network (massive)
inf-roadNet-CA Infra 1,957,027 2,760,388 1,318,302 52.40 s Unknown - California road network
inf-roadNet-PA Infra 1,087,562 1,541,514 731,860 34.57 s Unknown - Pennsylvania road network
Recommendation Networks
rec-amazon Recommend 262,111 899,792 55,437 3.34 s Unknown - Amazon product co-purchase
Retweet Networks
rt-retweet Retweet 96 117 34 0.00 ms ~31 ~1.097 General retweet
rt-retweet-crawl Retweet 96,768 117,214 98,066 77.81 s Unknown - Retweet crawl network
rt-twitter-copen Retweet 761 1,029 247 15.56 ms ~235 ~1.051 Twitter Copenhagen
Scientific Computing Networks
sc-ldoor Scientific 952,203 23,737,339 860,576 595.93 s Unknown - Structural problem (ldoor)
sc-msdoor Scientific 415,863 10,328,399 383,612 599.09 s Unknown - Structural problem (msdoor)
sc-nasasrb Scientific 54,870 1,311,227 54,212 37.11 s Unknown - NASA structural problem
sc-pkustk11 Scientific 87,804 1,956,706 84,918 48.68 s Unknown - Structural problem (pkustk11)
sc-pkustk13 Scientific 94,893 2,202,613 90,798 66.28 s Unknown - Structural problem (pkustk13)
sc-pwtk Scientific 217,891 5,653,274 216,879 110.71 s Unknown - Structural problem (pwtk)
sc-shipsec1 Scientific 140,874 3,568,176 135,118 38.41 s Unknown - Ship section FEM
sc-shipsec5 Scientific 179,860 4,598,604 172,652 43.25 s Unknown - Ship section FEM (5)
Strongly Connected Components
scc_enron-only SCC 143 251 140 93.74 ms ~137 ~1.022 Enron SCC
scc_fb-forum SCC 899 7,089 378 796.81 ms ~370 ~1.022 Facebook forum SCC
scc_fb-messages SCC 1,266 3,125 1,084 13.10 s Unknown - Facebook messages SCC
scc_infect-dublin SCC 410 1,800 9,273 3.41 s Unknown - Infection Dublin SCC
scc_infect-hyper SCC 113 171 110 46.90 ms ~109 ~1.009 Infection hyper SCC
scc_reality SCC 6,809 13,838 2,497 110.53 s Unknown - Reality mining SCC
scc_retweet SCC 96 87 591 812.42 ms Unknown - Retweet SCC
scc_retweet-crawl SCC 21,297 17,362 8,656 499.89 ms Unknown - Retweet crawl SCC
scc_rt_alwefaq SCC 35 34 35 0.00 ms 35 1.000 Optimal
scc_rt_assad SCC 16 15 16 0.00 ms 16 1.000 Optimal
scc_rt_bahrain SCC 37 36 39 15.58 ms 37 ~1.054 Bahrain retweet
scc_rt_barackobama SCC 29 28 29 0.00 ms 29 1.000 Optimal
scc_rt_damascus SCC 15 14 16 0.00 ms 15 ~1.067 Damascus retweet
scc_rt_dash SCC 15 14 15 15.58 ms 15 1.000 Optimal
scc_rt_gmanews SCC 46 45 46 15.70 ms 46 1.000 Optimal
scc_rt_gop SCC 6 5 6 0.00 ms 6 1.000 Optimal
scc_rt_http SCC 2 1 2 0.00 ms 2 1.000 Optimal
scc_rt_israel SCC 11 10 11 0.00 ms 11 1.000 Optimal
scc_rt_justinbieber SCC 26 25 26 15.58 ms 26 1.000 Optimal
scc_rt_ksa SCC 12 11 12 0.00 ms 12 1.000 Optimal
scc_rt_lebanon SCC 5 4 5 0.00 ms 5 1.000 Optimal
scc_rt_libya SCC 12 11 12 0.00 ms 12 1.000 Optimal
scc_rt_lolgop SCC 103 102 104 46.90 ms 103 ~1.010 lolgop retweet
scc_rt_mittromney SCC 42 41 42 0.00 ms 42 1.000 Optimal
scc_rt_obama SCC 4 3 4 0.00 ms 4 1.000 Optimal
scc_rt_occupy SCC 22 21 23 0.00 ms 22 ~1.045 Occupy retweet
scc_rt_occupywallstnyc SCC 45 44 45 0.00 ms 45 1.000 Optimal
scc_rt_oman SCC 6 5 6 0.00 ms 6 1.000 Optimal
scc_rt_onedirection SCC 29 28 29 0.00 ms 29 1.000 Optimal
scc_rt_p2 SCC 12 11 12 0.00 ms 12 1.000 Optimal
scc_rt_qatif SCC 5 4 5 0.00 ms 5 1.000 Optimal
scc_rt_saudi SCC 17 16 18 0.00 ms 17 ~1.059 Saudi retweet
scc_rt_tcot SCC 12 11 12 0.00 ms 12 1.000 Optimal
scc_rt_tlot SCC 6 5 6 0.00 ms 6 1.000 Optimal
scc_rt_uae SCC 8 7 8 0.72 ms 8 1.000 Optimal
scc_rt_voteonedirection SCC 4 3 4 0.00 ms 4 1.000 Optimal
scc_twitter-copen SCC 761 662 1,368 18.23 s Unknown - Twitter Copenhagen SCC
Social Networks
soc-BlogCatalog Social 88,784 2,093,195 23,641 54.62 s Unknown - BlogCatalog network
soc-brightkite Social 56,739 212,945 23,344 3.36 s Unknown - Brightkite location network
soc-buzznet Social 101,168 2,763,066 35,941 75.69 s Unknown - BuzzNet social network
soc-delicious Social 536,108 1,365,961 102,909 40.34 s Unknown - Delicious bookmarks
soc-digg Social 770,799 5,907,132 126,751 168.35 s Unknown - Digg social network
soc-dolphins Social 62 159 38 15.60 ms ~34 ~1.118 Dolphin social network
soc-douban Social 154,908 327,162 9,413 22.05 s Unknown - Douban social network
soc-epinions Social 26,588 100,120 10,566 2.48 s Unknown - Epinions trust network
soc-flickr Social 513,969 3,190,452 164,425 83.46 s Unknown - Flickr social network
soc-flixster Social 2,523,386 7,918,801 118,736 204.35 s Unknown - Flixster movie ratings
soc-FourSquare Social 639,014 3,214,986 95,200 105.72 s Unknown - Foursquare location network
soc-gowalla Social 196,591 950,327 93,196 29.25 s Unknown - Gowalla location network
soc-karate Social 34 78 15 0.00 ms 14 ~1.071 Karate club
soc-lastfm Social 1,191,805 4,519,330 101,252 129.37 s Unknown - Last.fm social network
soc-livejournal Social 4,036,538 27,933,062 2,042,769 2726.95 s Unknown - LiveJournal social network
soc-LiveMocha Social 104,103 2,193,083 50,466 558.06 s Unknown - LiveMocha language learning
soc-pokec Social 1,632,804 22,301,964 1,008,013 1107.77 s Unknown - Pokec social network
soc-slashdot Social 70,068 358,647 24,582 7.46 s Unknown - Slashdot social network
soc-twitter-follows Social 404,719 713,319 2,681 285.39 s Unknown - Twitter follows network
soc-wiki-Vote Social 889 2,914 454 31.25 ms ~404 ~1.124 Wikipedia voting
soc-youtube Social 495,957 1,991,903 166,746 52.50 s Unknown - YouTube social network
soc-youtube-snap Social 1,134,890 2,987,624 305,267 80.66 s Unknown - YouTube SNAP dataset
Facebook Networks
socfb-A-anon Facebook 3,097,165 23,667,394 620,587 2135.48 s Unknown - Facebook A (anonymized)
socfb-B-anon Facebook 2,937,612 20,959,854 534,414 2097.69 s Unknown - Facebook B (anonymized)
socfb-Berkeley13 Facebook 22,900 852,419 18,665 16.42 s Unknown - UC Berkeley 2013
socfb-CMU Facebook 6,621 251,214 5,418 3.62 s Unknown - Carnegie Mellon University
socfb-Duke14 Facebook 9,885 506,437 8,283 8.19 s Unknown - Duke University 2014
socfb-Indiana Facebook 29,732 1,306,440 25,156 24.97 s Unknown - Indiana University
socfb-MIT Facebook 6,441 251,230 5,071 3.78 s Unknown - MIT
socfb-OR Facebook 63,392 816,886 40,626 16.64 s Unknown - University of Oregon
socfb-Penn94 Facebook 41,554 1,362,220 33,996 31.47 s Unknown - University of Pennsylvania 1994
socfb-Stanford3 Facebook 11,586 568,309 9,217 380.33 s Unknown - Stanford University
socfb-Texas84 Facebook 36,364 1,590,655 30,505 41.79 s Unknown - University of Texas 1984
socfb-UCLA Facebook 20,453 747,604 16,536 19.60 s Unknown - UCLA
socfb-UConn Facebook 17,206 636,836 14,329 13.88 s Unknown - University of Connecticut
socfb-UCSB37 Facebook 14,917 482,215 12,267 12.52 s Unknown - UC Santa Barbara
socfb-UF Facebook 35,111 1,465,654 29,575 38.97 s Unknown - University of Florida
socfb-UIllinois Facebook 30,795 1,264,421 26,123 31.44 s Unknown - University of Illinois
socfb-Wisconsin87 Facebook 23,831 1,196,964 19,903 22.29 s Unknown - University of Wisconsin 1987
Technology & Infrastructure
tech-as-caida2007 Tech 26,475 53,381 3,828 2.94 s Unknown - CAIDA AS relationships 2007
tech-as-skitter Tech 1,694,616 11,094,209 644,397 266.04 s Unknown - CAIDA AS skitter
tech-internet-as Tech 22,963 48,436 5,960 1.48 s Unknown - Internet AS graph
tech-p2p-gnutella Tech 62,561 147,878 16,583 2.70 s Unknown - Gnutella P2P network
tech-RL-caida Tech 190,914 607,610 86,971 40.12 s Unknown - CAIDA router-level
tech-routers-rf Tech 2,113 6,632 860 78.11 ms ~793 ~1.084 Router network
tech-WHOIS Tech 7,476 56,943 2,452 765.55 ms Unknown - WHOIS network
Web Graphs
web-arabic-2005 Web 163,598 1,747,269 117,190 31.23 s Unknown - Arabic web 2005
web-BerkStan Web 12,776 19,500 5,854 312.54 ms Unknown - Berkeley-Stanford web
web-edu Web 3,031 6,474 2,483 78.12 ms ~1,449 ~1.713 Educational domain
web-google Web 1,299 2,773 509 31.34 ms ~497 ~1.024 Google web graph
web-indochina-2004 Web 11,358 47,606 7,971 609.30 ms Unknown - Indochina web crawl 2004
web-it-2004 Web 509,338 7,178,413 416,684 139.25 s Unknown - Italian web 2004
web-polblogs Web 643 2,280 267 31.19 ms ~243 ~1.099 Political blogs
web-sk-2005 Web 121,176 1,043,877 61,020 5.02 s Unknown - Slovak web crawl 2005
web-spam Web 4,767 37,375 2,524 484.33 ms Unknown - Web spam corpus
web-uk-2005 Web 133,633 5,507,679 127,774 225.23 s Unknown - UK web 2005
web-webbase-2001 Web 16,062 25,593 2,805 411.06 ms Unknown - Webbase crawl 2001
web-wikipedia2009 Web 1,864,433 4,507,315 722,055 139.22 s Unknown - Wikipedia 2009

Performance Analysis

Solution Quality Summary

Instances with Optimal Solutions: 24 instances achieved provably optimal vertex covers (ratio = 1.000)

Near-Optimal Performance: For instances with known optimal solutions:

  • Average approximation ratio: ρavg1.065\rho_{\text{avg}} \approx 1.065
  • Best ratio: ρmin=1.000\rho_{\text{min}} = 1.000 (24 instances)
  • Worst ratio: ρmax1.713\rho_{\text{max}} \approx 1.713 (web-edu)

Distribution of Approximation Ratios (on 46 instances with known bests):

  • ρ=1.000\rho = 1.000 : 24 instances (17.6% of total)
  • 1.000<ρ1.0501.000 < \rho \leq 1.050 : 11 instances (8.1% of total)
  • 1.050<ρ1.1501.050 < \rho \leq 1.150 : 10 instances (7.4% of total)
  • ρ>1.150\rho > 1.150 : 1 instance (0.7% of total)
  • Unknown optimal: 90 instances (66.2% of total)

Computational Efficiency

Runtime Categories:

  • Sub-second (< 1s): 47 instances (34.6%)
  • Fast (1s - 60s): 55 instances (40.4%)
  • Moderate (1min - 10min): 21 instances (15.4%)
  • Intensive (10min - 1hr): 11 instances (8.1%)
  • Very Intensive (> 1hr): 2 instances (1.5%)

Largest Instance Successfully Solved:

  • inf-road-usa: 23,947,347 vertices, 28,854,312 edges → VC size 13,989,792 in 71.1 minutes
  • soc-livejournal: 4,036,538 vertices, 27,933,062 edges → VC size 2,042,769 in 45.4 minutes
  • socfb-A-anon: 3,097,165 vertices, 23,667,394 edges → VC size 620,587 in 35.6 minutes

Graph Family Performance

Biological Networks: Good performance with ratios ≤ 1.145, sub-second runtime for most instances

Collaboration Networks: Efficient handling of large-scale networks (500K+ vertices) with moderate runtime

Social Networks: Strong scalability demonstrated on massive instances (4M+ vertices, 27M+ edges)

SCC Instances: Exceptional performance with 24 optimal solutions, demonstrating effectiveness on tree-like structures

Infrastructure Networks: Successfully handles the largest instance (23.9M vertices) in practical time

Scientific Computing Networks: Robust performance on FEM matrices and structural problems

Web Graphs: Handles very large web crawls (1.8M+ vertices) efficiently. The web-edu instance produced the worst-case ratio of ~1.713.


Comparison with State-of-the-Art

Comparison with Leading Heuristics

Based on published results from recent vertex cover literature, we compare Hallelujah against the most competitive state-of-the-art solvers:

Method Average Ratio Runtime Class Scalability Year Reference
TIVC (Zhang et al.) ~1.005 Fast Excellent (10^6 vertices) 2023 [3]
FastVC2+p (Cai et al.) ~1.020 Very Fast Excellent (10^6 vertices) 2017 [4]
NuMVC (Cai et al.) ~1.010 Fast Excellent (10^6 vertices) 2015 [5]
MetaVC2 (Luo et al.) ~1.010 Moderate Good (10^5 vertices) 2019 [6]
PLS (Pullan) ~1.020 Moderate Good (10^5 vertices) 2006 [7]
Hvala (Vega) ~1.007 Moderate Good (10^5 vertices) 2025 [8]
Hallelujah (Vega) ~1.065 Fast/Moderate Excellent (10^7 vertices) 2025 This work [2]

Challenging the Unique Games Conjecture

While Hallelujah's average-case ratio ( ρavg1.065\rho_{\text{avg}} \approx 1.065 ) is higher than specialized, near-optimal heuristics like TIVC or Hvala, its primary contribution lies elsewhere. The Hallelujah algorithm provides strong empirical evidence for a guaranteed sub-2 approximation ratio on real-world graphs.

The worst-case ratio observed across all 136 massive, diverse instances was ρmax1.713\rho_{\text{max}} \approx 1.713 (on web-edu), which is significantly below the theoretical 2-approximation barrier.

This result is important in the context of the Unique Games Conjecture (UGC). The UGC, if true, implies that it is NP-hard to approximate the vertex cover problem to any factor better than 2. Specifically, it suggests that no polynomial-time algorithm can achieve an approximation ratio of 2ϵ2 - \epsilon for any ϵ>0\epsilon > 0 .

The Hallelujah algorithm, by demonstrating a consistent sub-2 approximation ratio ( ρ1.713\rho \leq 1.713 ) across this comprehensive benchmark, presents a practical challenge to the implications of the UGC. While this experimental evidence does not disprove the conjecture (which is a formal statement about worst-case theoretical instances), it strongly suggests that the "hard" instances posited by the UGC are not representative of the structures found in real-world networks. Hallelujah's performance indicates that for practical applications, achieving a sub-2 approximation is not only feasible but consistently achievable.


Conclusion

The Milagro Experiment demonstrates that the Hallelujah algorithm is a robust, scalable, and highly effective solver for the approximate vertex cover problem on real-world graphs. It successfully processed 136 diverse instances, including massive networks with over 23 million vertices, on standard consumer hardware.

Key findings include:

  • Scalability: Processed the inf-road-usa graph (23.9M vertices, 28.8M edges) in just over 71 minutes.
  • Optimality: Found provably optimal solutions for 24 instances (17.6% of those tested).
  • Near-Optimality: Achieved an average approximation ratio of ρavg1.065\rho_{\text{avg}} \approx 1.065 for instances with known bests.
  • Sub-2 Guarantee: The maximum observed ratio was ρmax1.713\rho_{\text{max}} \approx 1.713 , providing strong empirical evidence of a consistent sub-2 approximation performance.

The consistent sub-2 approximation achieved by Hallelujah across this comprehensive benchmark challenges the practical implications of the Unique Games Conjecture, suggesting that the 2-inapproximability barrier may not be a significant factor for the overwhelming majority of networks encountered in practice.


References

[1] Network Data Repository, "Large Graphs Collection." [Online]. Available: https://lcs.ios.ac.cn/~caisw/graphs.html

[2] F. Vega, "Hallelujah: Approximate Vertex Cover Solver," dev.to, 2025. [Online]. Available: https://dev.to/frank_vega_987689489099bf/the-hallelujah-algorithm-4bgf

[3] P. Zhang, et al., "A Novel Iterated Vertex-Cover Algorithm," AAAI Conference on Artificial Intelligence, 2023.

[4] S. Cai, et al., "FastVC: A Fast Local Search Algorithm for Vertex Cover," International Joint Conference on Artificial Intelligence (IJCAI), 2017.

[5] S. Cai, et al., "NuMVC: A New Multi-Stage Local Search Algorithm for Vertex Cover," AAAI Conference on Artificial Intelligence, 2015.

[6] Z. Luo, et al., "A High-Performance Local Search Framework for Vertex Cover," International Joint Conference on Artificial Intelligence (IJCAI), 2019.

[7] W. Pullan, "Optimisation of graph problems by problem landscape," Journal of Heuristics, 2006.

[8] F. Vega, "The Hvala Algorithm," dev.to, 2025. [Online]. Available: https://dev.to/frank_vega_987689489099bf/the-hvala-algorithm-5395

Top comments (0)