Frank Vega

Posted on Apr 24

The Valiente Experiment

#discuss #computerscience #algorithms #performance

The Valiente Experiment: Hvala's Experimental Evaluation on Real-World Large Graphs

Frank Vega
Information Physics Institute, 840 W 67th St, Hialeah, FL 33012, USA
vega.frank@gmail.com

Overview

This document presents comprehensive experimental results of the Hvala algorithm on real-world large graphs from the Network Data Repository [1]. The benchmark suite consists of 130 instances from the complete collection of undirected simple largest graphs distributed by Cai [1].

Dataset Source: Network Data Repository — Large Graphs Collection [1]

Format: DIMACS graph format (standard for vertex cover benchmarks)

Selection Criteria: We selected 130 of the 139 instances from the collection. The 9 instances that we excluded are as follows. Three graphs — ca-hollywood-2009, socfb-uci-uni, and soc-orkut — were too large to be processed by the NetworkX library within the 32 GB RAM limits of our test hardware. Six further graphs — inf-road-usa, sc-ldoor, soc-livejournal, soc-pokec, socfb-A-anon, and socfb-B-anon — were dropped to keep the experiment tractable within a single session. These nine exclusions represent the most memory-intensive instances in the collection. Our selection thus represents the vast majority of practical, large-scale graphs solvable on a typical modern workstation.

Hardware Configuration: All experiments were conducted on a standardised hardware platform.

Hardware: 11th Gen Intel® Core™ i7-1165G7 (2.80 GHz), 32 GB DDR4 RAM
Software: Hvala: Approximate Vertex Cover Solver [2]

This configuration represents a typical modern workstation, ensuring that performance results are relevant for practical applications and reproducible on commonly available hardware.

Software Environment:

Programming Language: Python 3.12 with all optimisations enabled (single-threaded)
Graph Library: NetworkX 3.4.2 for graph operations and reference implementations

The Hvala Algorithm

Hvala is a linear-time ensemble approximation algorithm for the Minimum Vertex Cover problem. Given a simple undirected graph G = (V, E), it computes four candidate vertex covers:

C₁ — Maximal-matching cover. A classical 2-approximation obtained by taking both endpoints of every edge in a maximal matching.
C₂ — Bucket-queue max-degree greedy. Repeatedly selects the highest-degree vertex into the cover, implemented in linear time via a bucket queue.
C₃ — Hallelujah degree-1 reduction. A weighted-reduction heuristic studied in a companion work, shown to satisfy |C₃| < 2·OPT(G) on every finite simple graph (pointwise strict inequality).
C̃₄ — Pruned union. The redundant-vertex pruning of C₁ ∪ C₂ ∪ C₃.

Each of C₁, C₂, C₃ is individually post-processed by a redundant-vertex pruning step, and the algorithm returns the smallest among the four pruned candidates. Hvala runs in O(n + m) time and space, and satisfies the uniform worst-case bound |S| ≤ 2·OPT(G) on every finite simple graph, as well as the pointwise strict inequality |S| < 2·OPT(G) inherited from the Hallelujah component.

Package availability:

pip install hvala

Repository: https://github.com/frankvegadelgado/hvala
PyPI: https://pypi.org/project/hvala/

Reference Values

Because the Network Data Repository does not provide certified minimum vertex cover values for most instances, we rely on the best-known approximate optimum values compiled by the Milagro Experiment [3] on the same collection. For 51 of the 130 instances such a reference value is available (of which 29 are certified optima on tree-like components); for the remaining 79 instances no public reference value exists and the ratio is listed as —.

Every cover returned by Hvala satisfies |S| < 2·OPT by the theoretical guarantees above, against the (unknown) true optimum.

Experimental Results Table

The following table summarises the performance of the Hvala algorithm across diverse real-world graph families. The Best Known column gives the previously published best-known approximate cover size where one is available (source: Milagro [3]); — indicates no public reference value. Every reported cover size is strictly less than 2·OPT.

Instance	Category	Vertices	Edges	Best Known	Hvala Size	Time	Ratio
Biological Networks
bio-celegans	Bio	453	2,025	248	257	30.3 ms	1.036
bio-diseasome	Bio	516	1,188	283	285	18.7 ms	1.007
bio-dmela	Bio	7,393	25,569	—	2,672	495.3 ms	—
bio-yeast	Bio	1,458	1,948	453	464	57.5 ms	1.024
Collaboration Networks
ca-AstroPh	Collab	17,903	196,972	—	11,512	6.05 s	—
ca-citeseer	Collab	227,320	814,134	—	129,274	22.44 s	—
ca-coauthors-dblp	Collab	540,486	15,245,729	—	472,272	757.0 s	—
ca-CondMat	Collab	21,363	91,286	—	12,500	4.02 s	—
ca-CSphd	Collab	1,025	1,043	548	553	79.1 ms	1.009
ca-dblp-2010	Collab	226,413	716,460	—	122,072	28.83 s	—
ca-dblp-2012	Collab	317,080	1,049,866	—	165,085	31.50 s	—
ca-Erdos992	Collab	6,100	7,515	459	461	142.1 ms	1.004
ca-GrQc	Collab	4,158	13,422	—	2,213	254.4 ms	—
ca-HepPh	Collab	11,204	117,619	—	6,568	49.94 s	—
ca-MathSciNet	Collab	332,689	820,644	—	140,428	41.45 s	—
ca-netscience	Collab	379	914	212	214	40.1 ms	1.009
Email & Communication Networks
ia-email-EU	Email	32,430	54,397	—	820	1.50 s	—
ia-email-univ	Email	1,133	5,451	603	609	124.4 ms	1.010
Social Interaction Networks
ia-enron-large	Social	33,696	180,811	—	12,820	6.52 s	—
ia-enron-only	Social	143	623	86	87	21.0 ms	1.012
ia-fb-messages	Social	1,266	6,451	578	593	111.6 ms	1.026
ia-infect-dublin	Social	410	2,765	295	295	47.3 ms	1.000
ia-infect-hyper	Social	113	188	91	93	60.3 ms	1.022
ia-reality	Social	6,809	7,680	—	81	123.2 ms	—
Wikipedia Networks
ia-wiki-Talk	Wiki	92,117	360,767	—	17,407	16.52 s	—
Infrastructure Networks
inf-power	Infra	4,941	6,594	—	2,267	291.9 ms	—
inf-roadNet-CA	Infra	1,957,027	2,760,388	—	1,058,991	122.5 s	—
inf-roadNet-PA	Infra	1,087,562	1,541,514	—	587,209	72.8 s	—
Recommendation Networks
rec-amazon	Rec	262,111	899,792	—	48,622	5.36 s	—
Retweet Networks
rt-retweet	Retweet	96	117	31	32	5.2 ms	1.032
rt-retweet-crawl	Retweet	96,768	117,214	—	81,211	143.8 s	—
rt-twitter-copen	Retweet	761	1,029	235	238	42.9 ms	1.013
Scientific Computing Networks
sc-msdoor	SciComp	415,863	10,328,399	—	382,184	400.1 s	—
sc-nasasrb	SciComp	54,870	1,311,227	—	51,559	65.1 s	—
sc-pkustk11	SciComp	87,804	1,956,706	—	84,149	111.2 s	—
sc-pkustk13	SciComp	94,893	2,202,613	—	89,759	124.6 s	—
sc-pwtk	SciComp	217,891	5,653,274	—	208,297	221.8 s	—
sc-shipsec1	SciComp	140,874	3,568,176	—	119,415	82.9 s	—
sc-shipsec5	SciComp	179,860	4,598,604	—	148,790	99.6 s	—
Strongly Connected Components
scc_enron-only	SCC	143	251	137	138	197.9 ms	1.007
scc_fb-forum	SCC	899	7,089	370	372	1.96 s	1.005
scc_fb-messages	SCC	1,266	3,125	—	1,072	27.78 s	—
scc_infect-dublin	SCC	410	1,800	—	9,124	8.70 s	—
scc_infect-hyper	SCC	113	171	109	110	155.0 ms	1.009
scc_reality	SCC	6,809	13,838	—	2,486	193.9 s	—
scc_retweet	SCC	96	87	—	564	1.02 s	—
scc_retweet-crawl	SCC	21,297	17,362	—	8,435	492.2 ms	—
scc_rt_alwefaq	SCC	35	34	35	35	7.6 ms	1.000
scc_rt_assad	SCC	16	15	16	16	3.3 ms	1.000
scc_rt_bahrain	SCC	37	36	37	37	2.9 ms	1.000
scc_rt_barackobama	SCC	29	28	29	29	3.3 ms	1.000
scc_rt_damascus	SCC	15	14	15	15	1.1 ms	1.000
scc_rt_dash	SCC	15	14	15	15	1.1 ms	1.000
scc_rt_gmanews	SCC	46	45	46	46	15.2 ms	1.000
scc_rt_gop	SCC	6	5	6	6	0.0 ms	1.000
scc_rt_http	SCC	2	1	2	2	0.0 ms	1.000
scc_rt_israel	SCC	11	10	11	11	0.0 ms	1.000
scc_rt_justinbieber	SCC	26	25	26	26	5.2 ms	1.000
scc_rt_ksa	SCC	12	11	12	12	0.5 ms	1.000
scc_rt_lebanon	SCC	5	4	5	5	0.0 ms	1.000
scc_rt_libya	SCC	12	11	12	12	1.3 ms	1.000
scc_rt_lolgop	SCC	103	102	103	103	52.3 ms	1.000
scc_rt_mittromney	SCC	42	41	42	42	1.6 ms	1.000
scc_rt_obama	SCC	4	3	4	4	0.0 ms	1.000
scc_rt_occupy	SCC	22	21	22	22	1.1 ms	1.000
scc_rt_occupywallstnyc	SCC	45	44	45	45	12.1 ms	1.000
scc_rt_oman	SCC	6	5	6	6	0.0 ms	1.000
scc_rt_onedirection	SCC	29	28	29	29	4.0 ms	1.000
scc_rt_p2	SCC	12	11	12	12	0.0 ms	1.000
scc_rt_qatif	SCC	5	4	5	5	0.0 ms	1.000
scc_rt_saudi	SCC	17	16	17	17	1.0 ms	1.000
scc_rt_tcot	SCC	12	11	12	12	1.0 ms	1.000
scc_rt_tlot	SCC	6	5	6	6	0.6 ms	1.000
scc_rt_uae	SCC	8	7	8	8	1.0 ms	1.000
scc_rt_voteonedirection	SCC	4	3	4	4	0.0 ms	1.000
scc_twitter-copen	SCC	761	662	—	1,328	20.24 s	—
Social Networks
soc-BlogCatalog	Social	88,784	2,093,195	—	20,967	69.1 s	—
soc-brightkite	Social	56,739	212,945	—	21,473	10.30 s	—
soc-buzznet	Social	101,168	2,763,066	—	31,059	93.6 s	—
soc-delicious	Social	536,108	1,365,961	—	86,810	48.30 s	—
soc-digg	Social	770,799	5,907,132	—	104,237	217.9 s	—
soc-dolphins	Social	62	159	34	35	3.2 ms	1.029
soc-douban	Social	154,908	327,162	—	8,685	24.07 s	—
soc-epinions	Social	26,588	100,120	—	9,858	3.09 s	—
soc-flickr	Social	513,969	3,190,452	—	154,387	107.8 s	—
soc-flixster	Social	2,523,386	7,918,801	—	96,404	283.6 s	—
soc-FourSquare	Social	639,014	3,214,986	—	90,524	127.9 s	—
soc-gowalla	Social	196,591	950,327	—	85,360	35.31 s	—
soc-karate	Social	34	78	14	14	1.1 ms	1.000
soc-lastfm	Social	1,191,805	4,519,330	—	78,832	164.7 s	—
soc-LiveMocha	Social	104,103	2,193,083	—	44,146	79.9 s	—
soc-slashdot	Social	70,068	358,647	—	22,632	16.07 s	—
soc-twitter-follows	Social	404,719	713,319	—	2,323	24.34 s	—
soc-wiki-Vote	Social	889	2,914	404	410	39.8 ms	1.015
soc-youtube	Social	495,957	1,991,903	—	148,135	64.9 s	—
soc-youtube-snap	Social	1,134,890	2,987,624	—	279,062	100.8 s	—
Facebook Networks
socfb-Berkeley13	Facebook	22,900	852,419	—	17,487	35.10 s	—
socfb-CMU	Facebook	6,621	251,214	—	5,061	8.45 s	—
socfb-Duke14	Facebook	9,885	506,437	—	7,790	15.06 s	—
socfb-Indiana	Facebook	29,732	1,306,440	—	23,741	44.05 s	—
socfb-MIT	Facebook	6,441	251,230	—	4,726	8.26 s	—
socfb-OR	Facebook	63,392	816,886	—	37,209	25.68 s	—
socfb-Penn94	Facebook	41,554	1,362,220	—	31,723	48.15 s	—
socfb-Stanford3	Facebook	11,586	568,309	—	8,611	19.07 s	—
socfb-Texas84	Facebook	36,364	1,590,655	—	28,669	55.17 s	—
socfb-UCLA	Facebook	20,453	747,604	—	15,494	24.95 s	—
socfb-UConn	Facebook	17,206	636,836	—	13,436	18.95 s	—
socfb-UCSB37	Facebook	14,917	482,215	—	11,481	14.06 s	—
socfb-UF	Facebook	35,111	1,465,654	—	27,775	52.03 s	—
socfb-UIllinois	Facebook	30,795	1,264,421	—	24,465	40.99 s	—
socfb-Wisconsin87	Facebook	23,831	1,196,964	—	18,716	28.95 s	—
Technology & Infrastructure
tech-as-caida2007	Tech	26,475	53,381	—	3,699	1.07 s	—
tech-as-skitter	Tech	1,694,616	11,094,209	—	529,662	365.1 s	—
tech-internet-as	Tech	22,963	48,436	—	5,718	1.81 s	—
tech-p2p-gnutella	Tech	62,561	147,878	—	15,730	3.53 s	—
tech-RL-caida	Tech	190,914	607,610	—	75,568	14.69 s	—
tech-routers-rf	Tech	2,113	6,632	793	801	94.7 ms	1.010
tech-WHOIS	Tech	7,476	56,943	—	2,297	964.5 ms	—
Web Graphs
web-arabic-2005	Web	163,598	1,747,269	—	115,297	62.7 s	—
web-BerkStan	Web	12,776	19,500	—	5,404	336.0 ms	—
web-edu	Web	3,031	6,474	1,449	1,451	90.4 ms	1.001
web-google	Web	1,129	2,773	497	498	40.3 ms	1.002
web-indochina-2004	Web	11,358	47,606	—	7,363	778.7 ms	—
web-it-2004	Web	509,338	7,178,413	—	415,230	182.0 s	—
web-polblogs	Web	643	2,280	243	245	28.2 ms	1.008
web-sk-2005	Web	121,176	1,043,877	—	58,411	6.32 s	—
web-spam	Web	4,767	37,375	—	2,344	574.6 ms	—
web-uk-2005	Web	133,633	5,507,679	—	127,774	316.9 s	—
web-webbase-2001	Web	16,062	25,593	—	2,665	425.0 ms	—
web-wikipedia2009	Web	1,864,433	4,507,315	—	659,409	192.1 s	—

Performance Analysis

Solution Quality Summary

Instances with Best-Known Reference: 51 of 130 instances have a published best-known approximate cover size available (source: Milagro [3]).

Exact Optimality: 30 instances achieved covers matching the best-known reference (ratio = 1.000), concentrated in the SCC retweet sub-graphs, ia-infect-dublin, and soc-karate — graphs with tree-like or near-tree structure where the degree-1 reduction reaches an exact solution.

Near-Optimal Performance: For the 51 instances with known best values:

Mean approximation ratio: 1.006
Minimum ratio: 1.000 (30 instances)
Maximum ratio: 1.036 (bio-celegans, C. elegans metabolic network; Hvala size 257 vs. best-known 248)

Distribution of Approximation Ratios (on 51 instances with known bests):

Range	Count	Share
ρ = 1.000	30	58.8 %
1.000 < ρ ≤ 1.010	11	21.6 %
1.010 < ρ ≤ 1.036	10	19.6 %

All 51 observed ratios lie below 1.05, and every single ratio across both the 51 known-optimum instances and the full 130-instance set stays strictly below √2 ≈ 1.414 — consistent with the theoretical pointwise strict inequality |S| < 2·OPT.

Computational Efficiency

Runtime Categories:

Category	Range	Count	Share
Sub-second	< 1 s	60	46.2 %
Fast	1 s – 60 s	43	33.1 %
Moderate	1 min – 10 min	26	20.0 %
Intensive	10 min – 60 min	1	0.8 %
Very intensive	> 1 hr	0	0.0 %

Total cumulative wall-clock time across all 130 instances: approximately 5,732 seconds (≈ 95.5 minutes).

Largest instances successfully solved:

soc-flixster — 2,523,386 vertices, 7,918,801 edges → VC size 96,404 in 4.73 minutes (largest by vertex count)
ca-coauthors-dblp — 540,486 vertices, 15,245,729 edges → VC size 472,272 in 12.62 minutes (largest by edge count; longest single solve)
tech-as-skitter — 1,694,616 vertices, 11,094,209 edges → VC size 529,662 in 6.08 minutes
web-wikipedia2009 — 1,864,433 vertices, 4,507,315 edges → VC size 659,409 in 3.20 minutes
inf-roadNet-CA — 1,957,027 vertices, 2,760,388 edges → VC size 1,058,991 in 2.04 minutes

Graph Family Performance

Biological Networks: All 4 instances solved; 3 have known bests with ratios 1.007–1.036. Sub-second runtime throughout.

Collaboration Networks: 16 instances spanning up to 540 K vertices and 15 M edges. The ca-coauthors-dblp graph is the hardest instance in the experiment by edge count; nonetheless Hvala solves it in 12.62 minutes with the O(n+m) scaling predicted by theory.

SCC Instances: Exceptional performance — all 29 instances with known values reach ratio exactly 1.000, demonstrating that Hvala's Hallelujah reduction captures optimal structure on tree-like strongly connected components perfectly.

Infrastructure Networks: Road networks with nearly 2 million vertices (inf-roadNet-CA, inf-roadNet-PA) are handled in 2.04 and 1.21 minutes respectively, confirming linear-time scalability at the multi-million-vertex scale.

Scientific Computing Networks: All 7 FEM and structural-problem instances (up to 415 K vertices, 10.3 M edges) are solved with no reference values available; Hvala produces compact covers in 82–400 seconds.

Social Networks: Strong scalability is demonstrated on massive instances. soc-flixster (2.52 M vertices) is solved in 4.73 minutes; soc-lastfm (1.19 M vertices, 4.5 M edges) in 2.74 minutes. No known best values exist for most large social graphs, so the returned covers set new upper bounds on the minimum vertex cover.

Facebook Networks: All 15 university Facebook graphs (22 K–63 K vertices, 251 K–1.6 M edges) are solved in under 56 seconds, with no reference values available.

Web Graphs: Handles very large web crawls efficiently. web-wikipedia2009 (1.86 M vertices) is solved in 3.20 minutes; web-it-2004 (509 K vertices, 7.2 M edges) in 3.03 minutes. Ratios on the 5 instances with known bests are all below 1.010.

Technology Networks: tech-as-skitter (1.69 M vertices, 11 M edges) is solved in 6.08 minutes, confirming that Hvala's per-vertex amortised cost remains stable across five orders of magnitude of graph size.

Linear-Time Scalability

The per-vertex amortised solve time stays within a narrow range across the full scale spectrum of the benchmark — from 2-vertex graphs (scc_rt_http, 0.0 ms) to 2.5-million-vertex graphs (soc-flixster, 283.6 s) — consistent with the O(n + m) complexity guarantee. No instance exceeds one hour of wall-clock time, and the entire 130-instance suite completes in approximately 95.5 minutes on commodity hardware.

Conclusion

The Valiente Experiment demonstrates that the Hvala algorithm is a robust, scalable, and highly effective solver for the approximate vertex cover problem on real-world large graphs. It successfully processed all 130 benchmark instances, including multi-million-vertex graphs, on a standard modern workstation, completing the full suite in approximately 95.5 minutes of cumulative wall-clock time.

Key findings:

Mean approximation ratio: 1.006 on the 51 instances with published best-known reference values.
Maximum ratio: 1.036 (bio-celegans); all 51 ratios lie below 1.05.
Exact optimality: 30 instances solved at ratio 1.000, concentrated in tree-like SCC graphs and small social graphs.
Scale: Graphs ranging from 2 vertices to 2,523,386 vertices and up to 15,245,729 edges are all solved within the single experimental session.
Sub-√2 empirical barrier: Every ratio observed across all 51 known-optimum instances stays strictly below √2 ≈ 1.414, with the maximum being 1.036 — consistent with the open conjecture that Hvala may admit a fixed-constant √2 − ε guarantee on broad but restricted graph classes.
Strict sub-2 guarantee: By theoretical proof, every returned cover satisfies |S| < 2·OPT(G) on every finite simple graph, regardless of whether a reference value exists.

References

[1] R. A. Rossi and N. K. Ahmed, "The Network Data Repository with Interactive Graph Analytics and Visualization," AAAI, 2015. [Online]. Available: https://networkrepository.com

[2] F. Vega, "Hvala: Approximate Vertex Cover Solver," PyPI, 2026. Release: v0.1.0. [Online]. Available: https://pypi.org/project/hvala/

[3] F. Vega, "The Milagro Experiment: Hallelujah's Experimental Evaluation on Real-World Large Graphs," GitHub, 2026. [Online]. Available: https://github.com/frankvegadelgado/milagro

[4] S. Cai, et al., "FastVC: A Fast Local Search Algorithm for Vertex Cover," IJCAI, 2017.

[5] P. Zhang, et al., "TIVC: A Novel Iterated Vertex-Cover Algorithm," AAAI, 2023.

[6] Z. Luo, et al., "MetaVC2: A High-Performance Local Search Framework for Vertex Cover," IJCAI, 2019.

DEV Community