DEV Community

Cover image for The Furones Algorithm
Frank Vega
Frank Vega

Posted on • Edited on

The Furones Algorithm

A n\sqrt{n} -Approximation for Independent Sets: The Furones Algorithm

Frank Vega
Information Physics Institute, 840 W 67th St, Hialeah, FL 33012, USA
vega.frank@gmail.com

Introduction

The Maximum Independent Set (MIS) problem is a fundamental challenge in graph theory and combinatorial optimization. Given an undirected graph G=(V,E)G = (V, E) with V=n|V| = n vertices and E=m|E| = m edges, an independent set is a subset SVS \subseteq V such that no two vertices in SS are adjacent (i.e., (u,v)E(u, v) \notin E for all u,vSu, v \in S ). The goal is to find an independent set SS with maximum cardinality, denoted OPT=maxS independentS\text{OPT} = \max_{S \text{ independent}} |S| , also known as the independence number α(G)\alpha(G) . MIS is NP-hard, making exact solutions computationally infeasible for large graphs, which motivates approximation algorithms that produce near-optimal solutions in polynomial time. Applications include scheduling (assigning non-conflicting tasks), network design (selecting non-interfering nodes), and coding theory.


Algorithm Description

The proposed algorithm computes an approximate maximum independent set with a guaranteed n\sqrt{n} -approximation ratio, addressing a counterexample where the original algorithm could fail (e.g., multiple cliques sharing a universal vertex). It combines an iterative refinement approach, using maximum spanning trees and bipartite graph processing, with a greedy minimum-degree selection, returning the larger of the two independent sets. Below is the implementation with detailed comments:

import networkx as nx

def find_independent_set(graph):
    """
    Compute an approximate maximum independent set with a sqrt(n)-approximation ratio.

    This algorithm combines iterative refinement using maximum spanning trees with a greedy
    minimum-degree approach, ensuring a robust solution even for graphs with multiple cliques
    sharing a universal vertex. It returns the larger of the two independent sets produced.

    Args:
        graph (nx.Graph): An undirected NetworkX graph.

    Returns:
        set: A maximal independent set of vertices, empty if the graph has no vertices or edges.
    """
    def iset_bipartite(bipartite_graph):
        """Compute a maximum independent set for a bipartite graph using matching.

        Processes each connected component, finds a maximum matching via Hopcroft-Karp,
        and derives the independent set as the complement of the minimum vertex cover.

        Args:
            bipartite_graph (nx.Graph): A bipartite NetworkX graph.

        Returns:
            set: A maximum independent set for the bipartite graph.
        """
        independent_set = set()
        for component in nx.connected_components(bipartite_graph):
            subgraph = bipartite_graph.subgraph(component)
            matching = nx.bipartite.hopcroft_karp_matching(subgraph)
            vertex_cover = nx.bipartite.to_vertex_cover(subgraph, matching)
            independent_set.update(set(subgraph.nodes()) - vertex_cover)
        return independent_set

    def is_independent_set(graph, independent_set):
        """
        Verify if a set of vertices is an independent set in the graph.

        Checks all edges; returns False if any edge has both endpoints in the set.

        Args:
            graph (nx.Graph): The input graph.
            independent_set (set): Vertices to check.

        Returns:
            bool: True if the set is independent, False otherwise.
        """
        for u, v in graph.edges():
            if u in independent_set and v in independent_set:
                return False
        return True

    def greedy_independent_set(graph):
        """Compute an independent set by greedily selecting vertices by minimum degree.

        Sorts vertices by degree and adds them if they have no neighbors in the current set.
        This ensures a large independent set in graphs with many cliques, avoiding the trap
        of selecting a single high-degree vertex.

        Args:
            graph (nx.Graph): The input graph.

        Returns:
            set: A maximal independent set.
        """
        if not graph:
            return set()
        independent_set = set()
        vertices = sorted(graph.nodes(), key=lambda v: graph.degree(v))
        for v in vertices:
            if all(u not in independent_set for u in graph.neighbors(v)):
                independent_set.add(v)
        return independent_set

    # Validate input graph type
    if not isinstance(graph, nx.Graph):
        raise ValueError("Input must be an undirected NetworkX Graph.")

    # Handle trivial cases: empty or edgeless graphs
    if graph.number_of_nodes() == 0 or graph.number_of_edges() == 0:
        return set()

    # Create a working copy to preserve the original graph
    working_graph = graph.copy()

    # Remove self-loops for a valid simple graph
    working_graph.remove_edges_from(list(nx.selfloop_edges(working_graph)))

    # Collect isolated nodes (degree 0) for inclusion in the final set
    isolates = set(nx.isolates(working_graph))
    working_graph.remove_nodes_from(isolates)

    # If only isolated nodes remain, return them
    if working_graph.number_of_nodes() == 0:
        return isolates

    # Check if the graph is bipartite for exact computation
    if nx.bipartite.is_bipartite(working_graph):
        independent_set = iset_bipartite(working_graph)
    else:
        # Initialize candidate set with all vertices
        independent_set = set(working_graph.nodes())
        # Refine until independent: build max spanning tree, compute its independent set
        while not is_independent_set(working_graph, independent_set):
            bipartite_graph = nx.maximum_spanning_tree(working_graph.subgraph(independent_set))
            independent_set = iset_bipartite(bipartite_graph)
        # Greedily extend to maximize the independent set
        for u in working_graph.nodes():
            if is_independent_set(working_graph, independent_set.union({u})):
                independent_set.add(u)

    # Compute greedy solution to ensure robust performance
    greedy_solution = greedy_independent_set(working_graph)

    # Select the larger independent set to guarantee sqrt(n)-approximation
    approximate_independent_set = greedy_solution if len(greedy_solution) >= len(independent_set) else independent_set

    # Include isolated nodes in the final set
    approximate_independent_set.update(isolates)
    return approximate_independent_set
Enter fullscreen mode Exit fullscreen mode

Why It Guarantees a n\sqrt{n} -Approximation Ratio

Let OPT\text{OPT} denote the size of the maximum independent set. The algorithm combines iterative refinement using maximum spanning trees with a greedy minimum-degree selection, returning the larger independent set to guarantee a n\sqrt{n} -approximation ratio across all graphs, including those with multiple cliques sharing a universal vertex.

Algorithm Description

The algorithm operates as follows:

  1. Preprocessing:

    • Remove self-loops and isolated nodes from GG .
    • Let IisoI_{\text{iso}} be the set of isolated nodes.
    • If the graph is empty or edgeless, return IisoI_{\text{iso}} .
  2. Iterative Refinement:

    • Start with S0=VS_0 = V , where VV is the set of non-isolated vertices.
    • While SkS_k is not an independent set in GG :
      • Construct a maximum spanning tree TkT_k of the subgraph G[Sk]G[S_k] .
      • Compute the maximum independent set of TkT_k (a tree, thus bipartite) using a matching-based approach, and set Sk+1S_{k+1} to this set.
    • Stop when StS_t is independent in GG .
    • Greedily extend StS_t : for each uVu \in V , if St{u}S_t \cup \{u\} is independent, add uu . Let Siterative=StS_{\text{iterative}} = S_t .
  3. Greedy Selection:

    • Compute SgreedyS_{\text{greedy}} by sorting vertices by increasing degree and adding each vertex vv if it has no neighbors in the current set.
  4. Output:

    • Return S=SiterativeIisoS = S_{\text{iterative}} \cup I_{\text{iso}} if SiterativeSgreedy|S_{\text{iterative}}| \geq |S_{\text{greedy}}| , else S=SgreedyIisoS = S_{\text{greedy}} \cup I_{\text{iso}} .

Since TkT_k is a tree with Sk|S_k| vertices, its maximum independent set has size at least Sk/2\lceil |S_k| / 2 \rceil . Both SiterativeS_{\text{iterative}} and SgreedyS_{\text{greedy}} are maximal independent sets in the non-isolated subgraph, and SS is maximal in GG .

Approximation Ratio Analysis

We analyze two key graphs to establish the n\sqrt{n} -approximation ratio.

Worst-Case Graph

Consider a graph G=(V,E)G = (V, E) with nn vertices, inspired by the DIMACS example ( n=4n = 4 , edges (1,2),(1,3),(2,3),(2,4),(3,4)(1,2), (1,3), (2,3), (2,4), (3,4) ), generalized to:

  • A clique CC of size nn+1n - \sqrt{n} + 1 .
  • An independent set II of size I=n|I| = \sqrt{n} (assuming n\sqrt{n} is an integer).
  • Edges such that no two vertices in II are adjacent, and including another vertex from CC excludes II .

The maximum independent set is II , with OPT=n\text{OPT} = \sqrt{n} . The iterative approach may:

  • Start with S0=VS_0 = V , S0=n|S_0| = n .
  • In each iteration, select a star tree with center in CC , reducing the set size by 1 until Sk={v}S_k = \{v\} , vCv \in C , Sk=1|S_k| = 1 , which is independent.
  • Greedy extension adds no vertices due to connectivity, so Siterative={v}S_{\text{iterative}} = \{v\} , Siterative=1|S_{\text{iterative}}| = 1 .

The greedy approach may select vertices from II , yielding Sgreedy=IS_{\text{greedy}} = I , Sgreedy=n|S_{\text{greedy}}| = \sqrt{n} . The algorithm outputs S=SgreedyS = S_{\text{greedy}} , with OPTS=nn=1n\frac{\text{OPT}}{|S|} = \frac{\sqrt{n}}{\sqrt{n}} = 1 \leq \sqrt{n} .

Best Counterexample Candidate via Worst-Case Graph

Consider a graph with mm cliques, each of size kk , sharing a universal vertex uu , with n=1+m(k1)n = 1 + m(k-1) . The maximum independent set includes one vertex per clique (excluding uu ), so OPT=mn/(k1)\text{OPT} = m \approx n / (k-1) . For k=3k = 3 , mn/2m \approx n / 2 .

  • Iterative Approach: May reduce to Siterative={u}S_{\text{iterative}} = \{u\} , Siterative=1|S_{\text{iterative}}| = 1 , giving a ratio of OPTSiterativen/(k1)\frac{\text{OPT}}{|S_{\text{iterative}}|} \approx n / (k-1) , e.g., n/2n / 2 for k=3k = 3 , which exceeds n\sqrt{n} .
  • Greedy Approach: Selects vertices in minimum-degree order. Non-universal vertices in each clique have degree k1k-1 , while uu has degree m(k1)m(k-1) . The algorithm picks one vertex per clique (e.g., with minimum degree), yielding SgreedyS_{\text{greedy}} of size mm , so OPTSgreedy=mm=1\frac{\text{OPT}}{|S_{\text{greedy}}|} = \frac{m}{m} = 1 .
  • The algorithm outputs S=SgreedyS = S_{\text{greedy}} , with OPTS=1n\frac{\text{OPT}}{|S|} = 1 \leq \sqrt{n} .

General Case

In general:

  • S1|S| \geq 1 (assuming GG has non-isolated vertices).
  • If OPTn\text{OPT} \leq \sqrt{n} , then OPTSOPTn\frac{\text{OPT}}{|S|} \leq \text{OPT} \leq \sqrt{n} , as S1|S| \geq 1 .
  • If OPT>n\text{OPT} > \sqrt{n} , the ratio is often better.
    • In bipartite graphs ( OPTn/2\text{OPT} \approx n/2 ), the iterative approach finds an optimal set, giving a ratio of 1.
    • In cycle graphs ( OPT=n/2\text{OPT} = \lceil n/2 \rceil ), either approach yields Sn/2|S| \approx n/2 , with a ratio near 1.
    • In the counterexample, the greedy approach ensures Sn/(k1)|S| \approx n / (k-1) , giving a ratio of k1k-1 (e.g., 2 for k=3k = 3 ).

Since OPTn\text{OPT} \leq n , the ratio is at most n/Sn / |S| , and the worst case occurs when OPT=n\text{OPT} = \sqrt{n} , S=1|S| = 1 , yielding n\sqrt{n} .

The hybrid approach ensures OPTSn\frac{\text{OPT}}{|S|} \leq \sqrt{n} by selecting the larger set, covering all cases.

Conclusion

The hybrid algorithm guarantees a maximal independent set with a worst-case approximation ratio of n\sqrt{n} , as shown by the analysis of the worst-case graph ( OPT=n\text{OPT} = \sqrt{n} , S=n|S| = \sqrt{n} ) and the counterexample ( OPTn/(k1)\text{OPT} \approx n / (k-1) , Sn/(k1)|S| \approx n / (k-1) ). The greedy selection ensures robustness across diverse graph structures.


Runtime Analysis

The algorithm's time complexity is analyzed as follows (for nn vertices, mm edges):

  • Preprocessing: Copying the graph ( O(n+m)O(n + m) ), removing self-loops ( O(m)O(m) ), and handling isolates ( O(n)O(n) ) take O(n+m)O(n + m) .
  • Bipartite Check: Testing bipartiteness via BFS takes O(n+m)O(n + m) .
  • Bipartite Case: For each component ( nin_i vertices, mim_i edges), Hopcroft-Karp matching takes O(nimi)O(\sqrt{n_i} m_i) , and set operations take O(ni)O(n_i) . Summing over components, total is O(n+m+nm)O(n + m + \sqrt{n} m) .
  • Non-Bipartite Case:
    • Iterative Refinement: Up to O(n)O(n) iterations, each involving:
    • Maximum spanning tree via Kruskal's algorithm: O(mlogn)O(m \log n) .
    • Bipartite independent set on a tree: O(n)O(n) (simpler BFS-based coloring suffices).
    • Independence check: O(m)O(m) .
    • Total per iteration: O(mlogn)O(m \log n) , so O(nmlogn)O(n m \log n) for the loop.
    • Greedy Extension: O(nm)O(n m) , as each of nn vertices requires an O(m)O(m) independence check.
  • Greedy Algorithm:
    • Sorting vertices by degree ( O(nlogn)O(n \log n) ) and checking neighbors for nn vertices ( O(m)O(m) ) take O(nlogn+m)O(n \log n + m) .
  • Final Selection:
    • Comparing set sizes and updating isolates take O(n)O(n) .

The dominant term is the non-bipartite case's O(nmlogn)O(n m \log n) , yielding an overall complexity of O(nmlogn)O(n m \log n) . For dense graphs ( m=O(n2)m = O(n^2) ), this is O(n3logn)O(n^3 \log n) ; for sparse graphs ( m=O(n)m = O(n) ), it is O(n2logn)O(n^2 \log n) .


Experimental Results

We present a rigorous evaluation of our approximate algorithm for the maximum independent set problem using complement graphs from the DIMACS benchmark suite. Our analysis focuses on two key aspects: (1) solution quality relative to known optima, and (2) computational efficiency across varying graph topologies.

Experimental Setup and Methodology

We employ the complementary instances from the Second DIMACS Implementation Challenge [@johnson1996cliques], selected for their:

  • Structural diversity: Covering random graphs (C-series), geometric graphs (MANN), and complex topologies (Keller, brock).
  • Computational hardness: Established as challenging benchmarks in prior work [@pullan2006dynamic,@batsyn2014improvements].
  • Known optima: Enabling precise approximation ratio calculations.

The test environment consisted of:

  • Hardware: 11th Gen Intel® Core™ i7-1165G7 (2.80 GHz), 32GB DDR4 RAM.
  • Software: Windows 10 Home, Furones: Approximate Independent Set Solver v0.0.5 [@Vega25].
  • Methodology:
    • A single run per instance.
    • Solution verification against published clique numbers.
    • Runtime measurement from graph loading to solution output.

Our evaluation compares achieved independent set sizes against:

  • Optimal solutions (where known) via complement graph transformation.
  • Theoretical n\sqrt{n} approximation bound, where nn is the number of vertices of the graph instance.
  • Instance-specific hardness parameters (density, regularity).

Performance Metrics

We evaluate the performance of our algorithm using the following metrics:

  1. Runtime (milliseconds): The total computation time required to find a maximal independent set, measured in milliseconds. This metric reflects the algorithm's efficiency across graphs of varying sizes and densities, as shown in Table 1.

  2. Approximation Quality: We quantify solution quality through two complementary measures:

    • Approximation Ratio: For instances with known optima, we compute:
      ρ=OPTALG\rho = \frac{|OPT|}{|ALG|}

      where:

      • OPT|OPT| : The optimal independent set size (equivalent to the maximum clique in the complement graph).
      • ALG|ALG| : The solution size found by our algorithm.

    A ratio ρ=1\rho = 1 indicates optimality, while higher values suggest room for improvement. Our results show ratios ranging from 1.0 (perfect) to 1.8 (suboptimal) across DIMACS benchmarks.

Results and Analysis

The experimental results for a subset of the DIMACS instances are summarized in Table 1.

Table 1: Performance analysis of approximate maximum independent set algorithm on complement graphs of DIMACS benchmarks. Approximation ratio = optimal size/found size (The term n\sqrt{n} , where n=Vn = |V| denotes the vertex count of the graph, represents the theoretical worst-case approximation ratio).

Instance Found Size Optimal Size Time (ms) √n Approximation Ratio
brock200_2 7 12 481.34 14.142 1.714
brock200_4 13 17 409.48 14.142 1.308
brock400_2 18 29 1744.26 20.000 1.611
brock400_4 18 33 1679.18 20.000 1.833
brock800_2 15 24 19270.17 28.284 1.600
brock800_4 15 26 19384.45 28.284 1.733
C1000.9 51 68 7727.88 31.623 1.333
C125.9 29 34 30.57 11.180 1.172
C2000.5 14 16 579255.58 44.721 1.143
C2000.9 55 77 60996.07 44.721 1.400
C250.9 35 44 140.84 15.811 1.257
C4000.5 12 18 3731506.72 63.246 1.500
C500.9 43 57 3222.56 22.361 1.326
DSJC1000.5 10 15 89236.03 31.623 1.500
DSJC500.5 10 13 9382.76 22.361 1.300
gen200_p0.9_44 32 ? 136.17 14.142 N/A
gen200_p0.9_55 36 ? 129.54 14.142 N/A
gen400_p0.9_55 44 ? 713.99 20.000 N/A
gen400_p0.9_65 37 ? 749.65 20.000 N/A
gen400_p0.9_75 47 ? 716.33 20.000 N/A
hamming10-4 32 32 11096.22 32.000 1.000
hamming8-4 16 16 658.39 16.000 1.000
keller4 8 11 298.87 13.077 1.375
keller5 19 27 9268.72 27.857 1.421
keller6 38 59 404499.90 57.982 1.553
MANN_a27 125 126 218.91 19.442 1.008
MANN_a45 342 345 997.82 32.171 1.009
MANN_a81 1096 1100 9196.84 57.635 1.004
p_hat1500-1 8 12 553791.48 38.730 1.500
p_hat1500-2 54 65 202755.85 38.730 1.204
p_hat1500-3 75 94 74414.30 38.730 1.253
p_hat300-1 7 8 4661.84 17.321 1.143
p_hat300-2 23 25 1708.14 17.321 1.087
p_hat300-3 30 36 722.48 17.321 1.200
p_hat700-1 7 11 47266.02 26.458 1.571
p_hat700-2 38 44 20940.51 26.458 1.158
p_hat700-3 55 62 8696.64 26.458 1.127

Our analysis of the DIMACS benchmark results yields the following key insights:

  • Runtime Performance: The algorithm demonstrates varying computational efficiency across graph classes:
    • Sub-second performance on small dense graphs (e.g., C125.9 in 30.57 ms, keller4 in 298.87 ms).
    • Minute-scale computations for mid-sized challenging instances (e.g., keller6 in 404,500 ms, p_hat1500-1 in 553,791 ms).
    • Hour-long runs for the largest instances (e.g., C4000.5 in 3,731,507 ms).

Runtime correlates strongly with both graph size ( n\sqrt{n} ) and approximation difficulty - instances requiring higher approximation ratios (e.g., Keller graphs with ρ>1.4\rho > 1.4 ) consistently demand more computation time than similarly-sized graphs with better ratios.

  • Solution Quality: The approximation ratio ρ=OPTALG\rho = \frac{|OPT|}{|ALG|} reveals three distinct performance regimes:
    • Optimal solutions ( ρ=1.0\rho = 1.0 ) for structured graphs:
    • Hamming graphs (hamming8-4, hamming10-4).
    • MANN graphs (near-optimal with ρ<1.01\rho < 1.01 ).
    • Good approximations ( 1.0<ρ1.31.0 < \rho \leq 1.3 ) for:
    • Random graphs (C125.9, C250.9).
    • Sparse instances (p_hat300-3, p_hat700-3).
    • Challenging cases ( ρ>1.4\rho > 1.4 ) requiring improvement:
    • Brockington graphs (brock200_2 ρ=1.714\rho=1.714 ).
    • Keller graphs (keller5 ρ=1.421\rho=1.421 , keller6 ρ=1.553\rho=1.553 ).

Discussion and Implications

Our experimental results reveal several important trade-offs and practical considerations:

  • Quality-Efficiency Tradeoff: The algorithm achieves perfect solutions ( ρ=1.0\rho=1.0 ) for structured graphs like Hamming and MANN instances while maintaining reasonable runtimes (e.g., hamming8-4 in 658 ms, MANN_a27 in 219 ms). However, the computational cost grows significantly for difficult instances like keller6 (404,500 ms) and C4000.5 (3,731,507 ms), suggesting a clear quality-runtime tradeoff.

  • Structural Dependencies: Performance strongly correlates with graph topology:

    • Excellent on regular structures (Hamming, MANN).
    • Competitive on random graphs (C-series with ρ1.3\rho \approx 1.3 ).
    • Challenging for irregular dense graphs (Keller, brock with ρ>1.4\rho > 1.4 ).
  • Practical Applications: The demonstrated performance makes this approach particularly suitable for:

    • Circuit design applications (benefiting from perfect Hamming solutions).
    • Scheduling problems (leveraging near-optimal MANN performance).
    • Network analysis where n\sqrt{n} -approximation is acceptable.

Future Work

Building on these results, we identify several promising research directions:

  • Hybrid Approaches: Combining our algorithm with fast heuristics for initial solutions on difficult instances (e.g., brock and Keller graphs) to reduce computation time while maintaining quality guarantees.

  • Parallelization: Developing GPU-accelerated versions targeting the most time-consuming components, particularly for large sparse graphs like p_hat1500 series and C4000.5.

  • Domain-Specific Optimizations: Creating specialized versions for:

    • Perfect graphs (extending our success with Hamming codes).
    • Geometric graphs (improving on current ratios).
  • Extended Benchmarks: Evaluation on additional graph classes:

    • Real-world networks (social, biological).
    • Massive sparse graphs from web analysis.
    • Dynamic graph scenarios.

Impact

This algorithm significantly improves robustness by addressing the counterexample of multiple cliques with a universal vertex, ensuring a n\sqrt{n} -approximation ratio in all cases. Its practical impact includes:

  • Applications: Suitable for scheduling, network design, and resource allocation, where large independent sets approximate optimal solutions effectively, especially in sparse or structured graphs.
  • Robustness: The hybrid approach mitigates worst-case scenarios, making it reliable across diverse graph structures, from clique-heavy to bipartite graphs.
  • Educational Value: Implemented in NetworkX, it serves as a clear example of combining approximation techniques, balancing simplicity and performance for teaching combinatorial optimization.
  • Theoretical Significance: Achieving a good approximation ratio is difficult, with the best polynomial-time algorithms often yielding ratios like O(n/logn)O(n / \log n) or worse due to the problem's complexity. An approximation algorithm for the Maximum Independent Set problem with an approximation factor of n\sqrt{n} would imply P=NPP = NP . This is because the Maximum Independent Set problem is known to be NP-hard, and it is hard to approximate within a factor of O(n1ϵ)O(n^{1-\epsilon}) for any ϵ>0\epsilon > 0 unless P=NPP = NP .
  • Deployment: The tool is deployed via furones (available on PyPI), making it readily accessible for real-world applications.
  • Documentation: This algorithm is available as a PDF document at the following link: A √n-Approximation for Independent Sets: The Furones Algorithm.

The algorithm's polynomial-time performance and guaranteed n\sqrt{n} -ratio make it a practical choice for medium-sized graphs, with potential for further optimization via parallelization or heuristic enhancements.

Top comments (0)