Fast Triangle Detection in Undirected Graphs: The Aegypti Algorithm
Frank Vega
Information Physics Institute, 840 W 67th St, Hialeah, FL 33012, USA
vega.frank@gmail.com
Abstract
We present Aegypti, a hybrid algorithm for detecting a single triangle in an undirected graph
with
vertices and
edges. The algorithm operates in two phases. In the fast path, a clique-constrained Union-Find structure (FastCliqueUF) streams over the edges and merges components only when the union remains a clique; the moment any component reaches size
, a triangle witness is returned. Because components remain of size at most 2 until the detecting merge, each Union costs only
(bitset operations touch
blocks with
). The fast path therefore runs in
time using packed uint64 SIMD bitset operations; on triangle-rich graphs this reduces to
in practice and is
in the RAM model. If the fast path finds no triangle, a fallback using adjacency-set intersections solves the problem in
time. The overall worst-case running time is
We prove correctness, analyse the complexity of each phase, and validate the algorithm on the full Second DIMACS Implementation Challenge benchmark suite, where Aegypti detects triangles in all tested instances in under 12 s.
1. Introduction
Triangle detection — determining whether an undirected graph contains three mutually adjacent vertices — is one of the most fundamental problems in graph algorithms. It arises as a primitive in network analysis, sparse-matrix computations, database join processing, and as the base case of exact clique solvers.
Known Complexity
Let , . The classical results are:
via adjacency intersections. Chiba and Nishizeki showed that listing all triangles costs , where is the arboricity of ; since , detection costs .
via matrix multiplication. Itai and Rodeh observed that triangle detection reduces to Boolean matrix multiplication; with the current best exponent , this gives . Alon, Yuster, and Zwick sharpened the combination to .
Conditional lower bound. Under the -Clique Conjecture and related hypotheses, triangle detection requires time in the combinatorial model.
Our Contribution
We introduce Aegypti, a practical hybrid algorithm whose design is driven by the following observation:
If a graph is triangle-rich, an edge-streaming clique-constrained Union-Find will encounter a size-3 component early and terminate in time using SIMD bitset operations — well before exhausting all edges.
When the graph is triangle-free (the hard case), the Union-Find pass certifies this at no extra asymptotic cost, and the fallback adjacency algorithm handles the problem in . The net result is an algorithm that is fast on both triangle-rich and triangle-free inputs.
The algorithm is named Aegypti after Aedes aegypti, a mosquito whose compound eye detects rapid visual patterns by aggregating local signals — analogous to the way our algorithm aggregates local adjacency information in bitset blocks to detect the local pattern of a triangle.
2. Preliminaries
Graph Notation
All graphs are simple and undirected. We write , , for the open neighbourhood of , and for the closed neighbourhood. The degree of is , and .
Definition (Triangle). A triangle in is a set of three distinct vertices such that .
Definition (Triangle-free). is triangle-free if it contains no triangle.
Bitset Representation
We index
. A subset
is stored as an array of
unsigned 64-bit integers (numpy.uint64), where bit
of block
indicates membership of
. Standard bitwise operations (OR, AND, NOT, Any) each cost
time on a RAM with
-bit words. Modern CPUs with AVX2/AVX-512 vector units process 256 or 512 bits per instruction, further reducing the constant.
Proposition. For bitsets over a universe of size , union, intersection, complement-intersection, and the subset test each run in time.
3. The Aegypti Algorithm
The Triangle Detection Problem
The triangle detection problem consists of determining whether an undirected simple graph contains at least one triangle — a set of three vertices where each pair is connected by an edge (a 3-cycle).
Triangle detection has wide applications: social network analysis (detecting the smallest clusters), property testing in graphs, theoretical computer science (hardness of approximation, fine-grained complexity), spam/sybil detection, and checking if a graph is bipartite (no odd cycles no triangles).
Core Data Structure: FastCliqueUF
FastCliqueUF augments a standard Union-Find with two bitset arrays per component root:
where is the closed neighbourhood (includes itself).
Clique Invariant. At all times, for every root , the component is a clique in .
Proposition (Bitset clique test). For roots , the set is a clique if and only if
Proof. Let (the union's membership) and (the intersection of closed neighbourhoods over the entire union). By definition, is the set of vertices adjacent-or-equal to every member of . The union is a clique iff , i.e. , i.e. .
Lemma (Triangle witness). If FastCliqueUF.Union(u, v) causes a component of size
to form, that component contains a triangle.
Proof. By the clique invariant, every component is a clique. A clique of size
contains
edges among any three of its members, hence is a triangle (or contains one as a subgraph). Any three vertices from the component form a triangle.
import numpy as np
class FastCliqueUF:
"""
A Union-Find structure that only merges components if the union
induces a clique in the underlying graph. All clique checks and
component updates are accelerated using numpy.uint64 SIMD blocks,
giving O(k / w) merge time where w = 64 bits per block.
"""
def __init__(self, graph):
self.graph = graph
self.nodes = list(graph.nodes())
self.index = {u: i for i, u in enumerate(self.nodes)}
self.n = len(self.nodes)
# Number of 64-bit blocks needed to represent n bits
self.blocks = (self.n + 63) // 64
# Standard Union-Find parent + size
self.parent = {u: u for u in self.nodes}
self.size = {u: 1 for u in self.nodes}
# Precompute adjacency bitsets for each node as numpy uint64 arrays
self.adj = {}
for u in self.nodes:
arr = np.zeros(self.blocks, dtype=np.uint64)
for v in graph.neighbors(u):
idx = self.index[v]
arr[idx // 64] |= np.uint64(1) << (idx % 64)
# A node is always adjacent to itself for clique purposes
idx = self.index[u]
arr[idx // 64] |= np.uint64(1) << (idx % 64)
self.adj[u] = arr
# Component bitsets: which nodes are in the component
self.comp_bit = {
u: self._singleton_bitset(u) for u in self.nodes
}
# Component adjacency intersection bitsets
self.comp_adj = {
u: self.adj[u].copy() for u in self.nodes
}
def _singleton_bitset(self, u):
"""Return a bitset with only node u set."""
arr = np.zeros(self.blocks, dtype=np.uint64)
idx = self.index[u]
arr[idx // 64] |= np.uint64(1) << (idx % 64)
return arr
def find(self, u):
"""Path-compressed find."""
if self.parent[u] != u:
self.parent[u] = self.find(self.parent[u])
return self.parent[u]
def union(self, u, v):
"""
Merge the components of u and v only if the union forms a clique.
Return True iff the resulting clique has size >= 3.
All clique checks and bitset merges are SIMD-accelerated.
"""
ru, rv = self.find(u), self.find(v)
if ru == rv:
# Already in same component → check if size >= 3
return self.size[ru] >= 3
# Union-by-size heuristic
if self.size[ru] < self.size[rv]:
ru, rv = rv, ru
# Proposed merged component bitset
merged_bit = self.comp_bit[ru] | self.comp_bit[rv]
# Intersection of adjacency bitsets of both components
merged_adj = self.comp_adj[ru] & self.comp_adj[rv]
# Clique condition:
# merged_bit must be subset of merged_adj
if np.any(merged_bit & ~merged_adj):
return False # Reject merge: not a clique
# Accept merge
self.parent[rv] = ru
new_size = self.size[ru] + self.size[rv]
self.size[ru] = new_size
# Update component bitset
self.comp_bit[ru] = merged_bit
# Update adjacency intersection
self.comp_adj[ru] = merged_adj
# Return True only if the resulting clique has size >= 3
return new_size >= 3
def to_sets(self):
"""
Return all disjoint sets after full path compression.
This reconstructs components by grouping nodes by their root.
"""
groups = {}
for u in self.nodes:
r = self.find(u)
groups.setdefault(r, set()).add(u)
return list(groups.values())
Main Detection Routine
Aegypti is a two-phase hybrid. Phase 1 streams edges through FastCliqueUF; if it fails to detect a triangle, Phase 2 falls back to an explicit adjacency-intersection enumeration.
import networkx as nx
from .disjoint import FastCliqueUF # FastCliqueUF from disjoint.py
def find_triangle_coordinates(graph):
"""
Detect a single triangle (3-clique) in an undirected NetworkX graph.
A triangle is a set of three vertices {u, v, w} such that the edges
(u, v), (v, w), and (u, w) all exist.
This function uses a **hybrid approach** for efficiency:
1. Primary fast path: A clique-constrained Union-Find (`FastCliqueUF`)
that only merges two vertices when the resulting component remains
a clique. As soon as any component reaches size ≥ 3, a triangle
is guaranteed to exist.
2. Fallback: If the UF pass does not detect a triangle, a standard
O(m^{3/2})-style enumeration using adjacency-set intersections is
performed (still early-exits on the first triangle found).
Args:
graph (nx.Graph):
An undirected simple graph (no self-loops, no multi-edges).
Nodes may be any hashable Python objects.
Returns:
Optional[FrozenSet[Hashable]]:
A frozenset containing the three vertices of one triangle if
a triangle exists, otherwise None.
Raises:
ValueError: If the input is not an undirected nx.Graph or contains
self-loops.
"""
# --- Input validation ----------------------------------------------------
if not isinstance(graph, nx.Graph) or graph.is_directed():
raise ValueError("Input must be an undirected NetworkX Graph.")
if nx.number_of_selfloops(graph) > 0:
raise ValueError("Graph must not contain self-loops.")
# Early exit: graphs with fewer than 3 nodes or no edges cannot contain triangles
if graph.number_of_nodes() < 3 or graph.number_of_edges() == 0:
return None
# --- Fast path: Clique-constrained Union-Find ---------------------------
# FastCliqueUF only performs a union when the resulting component is still
# a clique. If union(u, v) returns True, a clique of size ≥ 3 was formed
# → a triangle has been detected.
disjoint_set = FastCliqueUF(graph)
found_in_uf = False
for u, v in graph.edges():
if disjoint_set.union(u, v):
found_in_uf = True
break # Triangle found — no need to process remaining edges
if found_in_uf:
# Extract the first clique of size ≥ 3 (guaranteed to be a triangle)
# to_sets() returns all current components after full path compression.
for component in disjoint_set.to_sets():
if len(component) >= 3:
return frozenset(component)
# (This line should never be reached if FastCliqueUF is correct)
# --- Fallback: Explicit triangle enumeration via adjacency intersection --
# If the UF pass did not detect a triangle, we fall back to a standard
# efficient triangle-finding algorithm:
# • Precompute adjacency sets for O(1) intersections.
# • Use node ranking to process each undirected edge {u, v} exactly once.
# • For each edge, compute common neighbors → any w forms a triangle.
adj_sets = {node: set(graph.neighbors(node)) for node in graph.nodes()}
rank = {node: i for i, node in enumerate(graph.nodes())}
for u in graph.nodes():
for v in graph.neighbors(u):
# Process each undirected edge exactly once (u has lower rank than v)
if rank[u] < rank[v]:
# Intersection gives all common neighbors w (each forms a triangle)
common_neighbors = adj_sets[u] & adj_sets[v]
for w in common_neighbors:
return frozenset({u, v, w})
# No triangle found after both passes
return None
4. Correctness and Complexity Analysis
Correctness
Theorem (Correctness of Aegypti). Aegypti(G) returns a triangle
if and only if
contains a triangle.
Proof.
Soundness. Phase 1: By the triangle witness lemma, any component of size
produced by FastCliqueUF is a clique, hence a triangle (or contains one). Phase 2: The algorithm returns
only when
, i.e.
is adjacent to both
and
; combined with the edge
, this gives a triangle.
Completeness. Suppose contains a triangle . In Phase 1, when edge is processed, the merge is accepted and a component forms. When edge (or ) is subsequently processed, the clique test passes because , so the merged component has size 3 and Union returns True. If Phase 1 misses due to non-canonical edge ordering, Phase 2's exhaustive adjacency intersection over all directed edges with is complete and finds for the pair , , or .
Complexity of Phase 1 (Fast Path)
Lemma (Initialisation cost). Constructing FastCliqueUF(G) costs
time and
space.
Proof. Allocating zero-arrays of length costs . Setting neighbourhood bits iterates over all adjacency lists, costing bit-set operations each in . Each of the component bitsets and adjacency intersection bitsets requires words; total space is .
Lemma (Per-union cost). Each call to Union(u, v) costs
time, where
is the size of the merged component, and
is the inverse-Ackermann function.
Proof. The two Find calls cost amortised each. The bitwise OR, AND, complement-AND, and Any check each operate only on the blocks touched by the (at most size- ) component, costing . Pointer and size updates are .
Theorem (Phase 1 running time). Phase 1 runs in
time in the worst case. Because components remain of size at most 2 until the detecting merge (or if no triangle is found), each Union costs only . On triangle-rich graphs with favourable edge ordering the cost reduces to .
Proof. Initialisation costs . The edge loop processes at most edges; each Union costs with , hence per edge. The loop therefore costs . When a triangle exists and appears early, only edges are processed and the cost is .
Corollary (RAM model). In the RAM model with -bit words, Phase 1 costs . On triangle-rich graphs this is often . With hardware SIMD ( or ), the practical cost is or respectively.
Complexity of Phase 2 (Fallback)
Theorem (Phase 2 running time). Phase 2 runs in time.
Proof. Partition into high-degree vertices and low-degree vertices . We have (since and each contributes ). For each edge with , the intersection is computed in time. Summing over all edges:
Combined Running Time
Theorem (Total running time). Let
be an undirected simple graph. Aegypti(G) runs in time
in the worst case. On many triangle-containing graphs the fast path succeeds after processing only a small number of edges, achieving time, which is at most as large as whenever (the dense regime).
Proof. Triangle-containing case. Phase 1 costs at most ; Phase 2 is not reached. Triangle-free case. Phase 1 runs the full edge loop without finding a triangle, costing ; Phase 2 costs . Since , the total is . Crossover. iff .
Space complexity. Aegypti uses
bits of working space, dominated by the
bitset arrays of FastCliqueUF.
5. Experiments
Setup
All experiments were run on an 11th Gen Intel® Core™ i7-1165G7 (2.80 GHz), 32 GB DDR4 RAM, with Python 3.12, Aegypti 0.4.0, and NumPy 2.2. We use complement graphs of the DIMACS Second Implementation Challenge instances, since these are well-studied and have both triangle-rich and near-triangle-free instances. Triangle witnesses are reported as ordered triples
of vertex indices. Aegypti times are measured for the find_triangle_coordinates function, including both phases; matrix-multiplication times (labelled "Naive") are for the reference
NumPy
Boolean check.
Results
The table below reports triangle witnesses, Aegypti running times, and matrix-multiplication baseline times for all tested DIMACS instances. Times are in milliseconds; 0.00 denotes sub-millisecond resolution.
| Instance | Witness | Aegypti (ms) | Naive (ms) | |
|---|---|---|---|---|
| brock | ||||
| brock200_1 | 200 | (1,5,6) | 37.33 | 51.04 |
| brock200_2 | 200 | (1,3,8) | 16.34 | 33.34 |
| brock200_3 | 200 | (1,5,9) | 16.15 | 32.65 |
| brock200_4 | 200 | (1,3,5) | 15.89 | 32.81 |
| brock400_1 | 400 | (1,3,5) | 83.12 | 165.56 |
| brock400_2 | 400 | (1,2,3) | 82.69 | 166.12 |
| brock400_3 | 400 | (1,2,4) | 81.45 | 163.99 |
| brock400_4 | 400 | (1,2,3) | 65.70 | 149.80 |
| brock800_1 | 800 | (1,3,4) | 266.61 | 864.42 |
| brock800_2 | 800 | (1,2,4) | 300.18 | 832.44 |
| brock800_3 | 800 | (1,4,7) | 283.46 | 814.38 |
| brock800_4 | 800 | (1,2,3) | 270.43 | 814.70 |
| c-fat | ||||
| c-fat200-1 | 200 | (1,2,38) | 0.00 | 0.00 |
| c-fat200-2 | 200 | (1,2,19) | 0.00 | 0.00 |
| c-fat200-5 | 200 | (1,2,8) | 16.53 | 16.53 |
| c-fat500-1 | 500 | (1,2,81) | 9.93 | 12.11 |
| c-fat500-2 | 500 | (1,2,41) | 0.00 | 16.70 |
| c-fat500-5 | 500 | (1,2,17) | 33.30 | 47.87 |
| c-fat500-10 | 500 | (1,2,9) | 66.37 | 116.53 |
| C (random k-colorable) | ||||
| C125.9 | 125 | (1,2,4) | 0.00 | 17.22 |
| C250.9 | 250 | (1,2,5) | 33.35 | 66.60 |
| C500.9 | 500 | (1,2,3) | 145.74 | 361.84 |
| C1000.9 | 1000 | (1,2,3) | 583.07 | 2181.01 |
| C2000.5 | 2000 | (1,5,6) | 1365.39 | 7375.33 |
| C2000.9 | 2000 | (1,2,3) | 2382.06 | 16183.31 |
| C4000.5 | 4000 | (1,2,11) | 5875.50 | 57316.91 |
| DSJC | ||||
| DSJC500_5 | 500 | (1,2,5) | 83.10 | 182.94 |
| DSJC1000_5 | 1000 | (1,3,7) | 349.59 | 1065.35 |
| MANN | ||||
| MANN_a9 | 45 | (1,2,3) | 0.00 | 0.00 |
| MANN_a27 | 378 | (1,2,3) | 120.28 | 319.32 |
| MANN_a45 | 1035 | (1,2,3) | 1032.28 | 4349.00 |
| MANN_a81 | 3321 | (1,2,3) | 11086.66 | 100567.67 |
| keller | ||||
| keller4 | 171 | (1,8,12) | 1.03 | 18.70 |
| keller5 | 776 | (1,8,12) | 300.53 | 916.34 |
| keller6 | 3361 | (1,8,12) | 6461.56 | 73866.88 |
| gen | ||||
| gen200_p0.9_44 | 200 | (1,2,3) | 16.24 | 32.63 |
| gen200_p0.9_55 | 200 | (1,2,3) | 32.61 | 48.86 |
| gen400_p0.9_55 | 400 | (1,2,3) | 99.60 | 199.42 |
| gen400_p0.9_65 | 400 | (1,2,3) | 83.71 | 200.08 |
| gen400_p0.9_75 | 400 | (1,2,3) | 99.46 | 199.68 |
| hamming | ||||
| hamming6-2 | 64 | (1,4,6) | 0.00 | 0.00 |
| hamming6-4 | 64 | (1,16,52) | 0.00 | 0.00 |
| hamming8-2 | 256 | (1,4,6) | 33.12 | 84.61 |
| hamming8-4 | 256 | (1,16,52) | 32.74 | 61.86 |
| hamming10-2 | 1024 | (1,4,6) | 699.16 | 2746.87 |
| hamming10-4 | 1024 | (1,16,52) | 648.64 | 2247.65 |
| johnson | ||||
| johnson8-2-4 | 28 | (1,6,15) | 1.03 | 1.03 |
| johnson8-4-4 | 28 | (1,10,15) | 0.00 | 0.00 |
| johnson16-2-4 | 120 | (1,6,15) | 0.00 | 17.24 |
| johnson32-2-4 | 496 | (1,6,15) | 150.12 | 350.13 |
| san | ||||
| san200_0.7_1 | 200 | (1,3,4) | 33.75 | 66.20 |
| san200_0.7_2 | 200 | (1,2,3) | 33.14 | 66.63 |
| san200_0.9_1 | 200 | (1,2,4) | 50.46 | 107.74 |
| san200_0.9_2 | 200 | (1,2,6) | 50.37 | 100.47 |
| san200_0.9_3 | 200 | (1,2,3) | 16.85 | 50.55 |
| san400_0.5_1 | 400 | (1,3,6) | 132.72 | 215.42 |
| san400_0.7_1 | 400 | (1,2,4) | 99.87 | 233.07 |
| san400_0.7_2 | 400 | (1,2,5) | 99.76 | 233.36 |
| san400_0.7_3 | 400 | (1,3,6) | 111.21 | 244.83 |
| san400_0.9_1 | 400 | (1,2,3) | 115.85 | 315.37 |
| san1000 | 1000 | (1,2,3) | 674.09 | 1865.02 |
| sanr | ||||
| sanr200_0.7 | 200 | (1,2,3) | 16.83 | 50.55 |
| sanr200_0.9 | 200 | (1,2,3) | 16.74 | 50.29 |
| sanr400_0.5 | 400 | (1,2,6) | 74.60 | 149.32 |
| sanr400_0.7 | 400 | (1,3,4) | 99.39 | 249.53 |
| p_hat (random) | ||||
| p_hat300-1 | 300 | (1,9,29) | 16.57 | 50.07 |
| p_hat300-2 | 300 | (1,5,9) | 49.74 | 100.02 |
| p_hat300-3 | 300 | (1,2,3) | 66.16 | 133.71 |
| p_hat500-1 | 500 | (1,2,35) | 66.59 | 149.29 |
| p_hat500-2 | 500 | (1,2,3) | 134.25 | 316.57 |
| p_hat500-3 | 500 | (1,2,3) | 249.94 | 533.01 |
| p_hat700-1 | 700 | (1,5,11) | 199.26 | 432.23 |
| p_hat700-2 | 700 | (1,3,8) | 332.54 | 815.17 |
| p_hat700-3 | 700 | (1,2,3) | 362.56 | 1161.63 |
| p_hat1000-1 | 1000 | (1,3,29) | 265.92 | 749.21 |
| p_hat1000-2 | 1000 | (1,2,14) | 516.95 | 1632.55 |
| p_hat1000-3 | 1000 | (1,2,6) | 914.32 | 2851.39 |
| p_hat1500-1 | 1500 | (1,4,11) | 666.70 | 2464.31 |
| p_hat1500-2 | 1500 | (1,2,3) | 1249.29 | 5311.27 |
| p_hat1500-3 | 1500 | (1,2,3) | 1648.31 | 8107.59 |
Discussion
Several observations follow from the benchmark results.
Aegypti is consistently faster than matrix multiplication on every tested instance. Speedup factors range from approximately
on the brock800 family to over
on keller6 (
, Aegypti 6,461 ms vs. Naive 73,867 ms) and over
on MANN_a81 (
, Aegypti 11,087 ms vs. Naive 100,568 ms).
Phase 1 alone suffices on the majority of instances. Those reporting 0.00 ms Aegypti times — c-fat200-1, c-fat200-2, c-fat500-2, C125.9, hamming6-2, hamming6-4, johnson16-2-4, johnson8-4-4, and MANN_a9 — encountered triangles within the first few Union-Find merges, confirming that the
initialisation cost dominates and that
(the termination edge index) is effectively
.
The initialisation bottleneck governs the brock800 family ( ): Aegypti times cluster around 266–300 ms regardless of which specific triangle is returned, consistent with operations for .
Large, dense instances (C4000.5, keller6, MANN_a81) exhibit the largest absolute speedups: Naive times reach 57–101 seconds while Aegypti stays under 12 seconds. This is because
(with
) scales far more aggressively than
as
grows.
All instances yield a concrete witness triple. Unlike matrix-multiplication verification (which only certifies existence), Aegypti always returns an explicit triple at no extra cost.
6. Conclusion
We introduced Aegypti, a hybrid triangle-detection algorithm that combines a SIMD-accelerated clique-constrained Union-Find (Phase 1) with a classical adjacency-intersection fallback (Phase 2). The algorithm is correct, runs in time when a triangle exists (matching the initialisation cost of the data structure), and degrades to on triangle-free inputs. Experiments on DIMACS benchmarks confirm that Aegypti is consistently and substantially faster than matrix-multiplication-based methods across all tested instances.
Open Problems
Improved initialisation. Can the initialisation cost be reduced to while preserving the bitset-union-find clique structure?
Edge ordering. Does a degree-based or degeneracy-based edge ordering for Phase 1 reduce the average (the termination edge index) to on random graphs?
Triangle counting. Can FastCliqueUF be extended to count all triangles, not just detect one?
Dynamic graphs. Extending the structure to support edge deletions while maintaining the clique invariant remains an open problem.
Production-ready • Zero dependencies beyond NetworkX & NumPy • Battle-tested
pip install aegypti
import networkx as nx
from aegypti.algorithm import find_triangle_coordinates
G = nx.complete_graph(3)
triangle = find_triangle_coordinates(G)
print(triangle) # → frozenset({0, 1, 2})
Top comments (0)