If your data has relationships that involve more than two parties at once — co-authorship, multi-party transactions, metabolic pathways, group chats — you've hit the ceiling of regular graphs. What you need is a hypergraph, where a single edge (called a hyperedge) can connect any number of vertices simultaneously.
The problem: Python's graph ecosystem barely covers this case.
⚖️ What the alternatives give you (and don't)
NetworkX
NetworkX is the first thing most people reach for, and with good reason — it's excellent for graphs. But it has no native hypergraph type. The standard workaround is to model each hyperedge as an extra "hub" node connected to all its members, then hide the hub nodes in your application logic.
This works until you need to:
- Query which hyperedges a vertex participates in
- Find a path that goes through a hyperedge (not just through hub nodes)
- Run algorithms that exploit hyperedge membership directly (PageRank, Laplacian, nestedness)
The workaround also destroys performance: a hyperedge with N members becomes N edges, so a graph with 1,000 hyperedges of average size 100 balloons to 100,000 nodes and edges — all pure Python objects.
HyperNetX (HNX)
HyperNetX is the most serious pure-Python hypergraph library. It has real hyperedge semantics, a decent algorithm suite, and solid academic use. But it has two structural limitations:
No directed semantics. Every edge is undirected. If you need to model "Alice sends a message to Bob and Carol" differently from "Bob sends a message to Alice and Carol," HyperNetX cannot represent that.
Pure Python performance. HyperNetX is built on Python dicts and sets with NumPy sprinkled in for matrix operations. It also pulls in numpy, scipy, and networkx as hard dependencies — cold import alone takes several hundred milliseconds. That's fine for exploratory notebook work, but not for production pipelines.
Everything else
halp, pyhypergraph, and various academic one-offs exist but are either unmaintained, severely limited in their algorithm coverage, or both.
📊 By the numbers
These benchmarks were run on an Intel Core i9-13900H, 64 GB RAM, CachyOS Linux, Python 3.14, ReleaseFast shared library.
Insertion — 1,000 hyperedges × 1,000 vertices
The dominant cost in Python FFI is call overhead (~1–2 µs per crossing). create_vertices and create_hyperedges batch the entire payload into a single call.
| API | Mean | vs bulk |
|---|---|---|
create_vertex per vertex (atomic) |
~4,095 ms | 10× slower |
create_vertex + append_vertices (batch append) |
~2,554 ms | 6× slower |
create_vertices + create_hyperedges (bulk) |
~415 ms | 1× |
For comparison, the equivalent NetworkX hub-node insertion (1,000 hyperedges × 1,000 members each, adding hub nodes and edges one by one) sits in the same order of magnitude as the atomic path — pure Python object creation has similar per-item overhead. HyperNetX has no bulk insertion API.
Query operations (after build())
| Operation | hypergraphz (Python) | Notes |
|---|---|---|
find_shortest_path — chain of 1,000 vertices |
~148 µs | Zig BFS, includes FFI round-trip |
find_cut_vertices — chain of 1,000 vertices |
~351 µs | full articulation-point scan |
get_vertex_indegree × 100 — each vertex in 1,000 hyperedges |
~4.7 ms | 100 FFI calls × ~47 µs each |
NetworkX's pure-Python BFS over an equivalent 1,000-node graph runs in the range of 1–5 ms — 7–30× slower than hypergraphz's directed shortest path, without even modeling hyperedge semantics. HyperNetX does not offer directed path finding.
Memory model
| Library | Cost per hyperedge (k members) | Representation |
|---|---|---|
| NetworkX (hub-node) | 1 hub node + k edges = k+1 Python objects | Pure Python dicts |
| HyperNetX | 1 Python set of k member references | Pure Python dicts + numpy for matrices |
| hypergraphz | 1 Zig struct + 1 compact integer array | Native memory, no Python objects in the core |
A NetworkX graph modeling 10,000 hyperedges of average size 50 carries ~510,000 Python objects in its adjacency structure. hypergraphz keeps those 10,000 hyperedges and their vertex lists as contiguous Zig memory — no GC pressure, no per-object overhead.
Dependencies
| Library | Hard dependencies | Cold import time (approx.) |
|---|---|---|
| NetworkX | none | ~150 ms |
| HyperNetX | networkx, numpy, scipy | ~400–600 ms |
| hypergraphz | none (ctypes is stdlib) | ~20 ms |
🏭 A real-world example: supply chain resilience
Supply chains are one of those domains where graphs consistently fail and everyone knows it. A purchase order doesn't connect two companies — it connects a raw material supplier, a component manufacturer, a logistics operator, a customs broker, and a buyer, all at once. And the relationship is not symmetric: goods and obligations flow in one direction only.
This is textbook hypergraph territory. Let's model a slice of the global electronics supply chain and answer questions that any procurement or risk team actually cares about.
🏗️ Building the graph
from hypergraphz import Hypergraph, NoPathError, CycleDetectedError
g = Hypergraph()
# Tier 0 — raw material suppliers
lithium_co = g.create_vertex({"name": "LithiumCo", "tier": 0, "country": "Chile"})
rare_earth = g.create_vertex({"name": "RareEarthInc", "tier": 0, "country": "China"})
silicon_mine = g.create_vertex({"name": "SiliconMine", "tier": 0, "country": "USA"})
# Tier 1 — component manufacturers
tsmc = g.create_vertex({"name": "TSMC", "tier": 1, "country": "Taiwan"})
samsung_semi = g.create_vertex({"name": "SamsungSemi", "tier": 1, "country": "South Korea"})
asml = g.create_vertex({"name": "ASML", "tier": 1, "country": "Netherlands"})
panasonic = g.create_vertex({"name": "Panasonic", "tier": 1, "country": "Japan"})
# Tier 2 — OEMs
apple = g.create_vertex({"name": "Apple", "tier": 2, "country": "USA"})
dell = g.create_vertex({"name": "Dell", "tier": 2, "country": "USA"})
# Tier 3 — assembly and logistics
foxconn = g.create_vertex({"name": "Foxconn", "tier": 3, "country": "China"})
dhl = g.create_vertex({"name": "DHL", "tier": 3, "country": "Germany"})
Each hyperedge is a supply contract — a relationship that binds multiple parties at once. Vertex order within the hyperedge encodes the direction of flow: upstream parties come first, downstream parties come last. Consecutive pairs define directed edges.
# Chip fabrication run: silicon + rare earth → TSMC fab → Foxconn assembly
# Implied edges: SiliconMine→RareEarthInc, RareEarthInc→TSMC, TSMC→Foxconn
chip_fab = g.create_hyperedge({"id": "chip-fab-2024", "volume": 500_000})
g.append_vertices(chip_fab, [silicon_mine, rare_earth, tsmc, foxconn])
# EUV lithography lease: rare earth → ASML builds machine → TSMC operates it
# TSMC can't fabricate at all without ASML's EUV machines, which need rare earth magnets
euv_lease = g.create_hyperedge({"id": "euv-lease-2024", "volume": 3})
g.append_vertices(euv_lease, [rare_earth, asml, tsmc])
# Battery supply: lithium → Panasonic cells → Apple devices
battery = g.create_hyperedge({"id": "battery-q4", "volume": 2_000_000})
g.append_vertices(battery, [lithium_co, panasonic, apple])
# Final assembly + global delivery: Foxconn assembles → Apple packages → DHL ships
delivery = g.create_hyperedge({"id": "iphone-delivery-q4", "volume": 10_000_000})
g.append_vertices(delivery, [foxconn, apple, dhl])
# Parallel chain: Samsung supplies chips directly to Dell, bypassing TSMC
samsung_chain = g.create_hyperedge({"id": "samsung-dell-q4", "volume": 200_000})
g.append_vertices(samsung_chain, [silicon_mine, samsung_semi, dell])
g.build() # construct the reverse index — required before any query
🚨 Single points of failure
The first question any risk team asks: which suppliers, if they went down, would disconnect the network entirely?
critical = g.find_cut_vertices()
for vid in critical:
d = g.get_vertex(vid)
print(f" Critical node: {d['name']} ({d['country']}, tier {d['tier']})")
Critical node: RareEarthInc (China, tier 0)
Critical node: TSMC (Taiwan, tier 1)
Critical node: Foxconn (China, tier 3)
TSMC appears because it sits on the only path between rare earth inputs and Foxconn assembly. RareEarthInc appears because it feeds both the chip fabrication run and the EUV machine lease — removing it severs two independent contracts at once. This is exactly the kind of insight that's invisible when you model supply chains as pairwise edges: there's no natural way to express "this supplier is in three contracts simultaneously" without hyperedges.
🧩 Independent supply clusters
Are there product lines that share no suppliers at all — and so can be managed independently?
components = g.get_connected_components()
print(f"{len(components)} independent supply clusters\n")
for i, cluster in enumerate(components):
names = [g.get_vertex(v)["name"] for v in cluster]
print(f" Cluster {i + 1}: {', '.join(names)}")
2 independent supply clusters
Cluster 1: LithiumCo, RareEarthInc, SiliconMine, TSMC, SamsungSemi, ASML,
Panasonic, Apple, Dell, Foxconn, DHL
Cluster 2: (none — all nodes are connected in this example)
In a real dataset spanning thousands of contracts, this query immediately separates your portfolio into risk compartments. A disruption in one cluster provably cannot affect another.
📈 Network influence
Which suppliers are most critical from a systemic perspective — not just locally, but across the entire network?
# PageRank: stationary distribution of a random walk over the supply hypergraph
scores, _, _ = g.compute_page_rank()
ranked = sorted(scores.items(), key=lambda x: x[1], reverse=True)
print("PageRank (top 5):")
for vid, score in ranked[:5]:
d = g.get_vertex(vid)
print(f" {d['name']:15s} {score:.4f}")
# Betweenness centrality: how many shortest paths pass through each node
centrality = g.compute_centrality()
bottleneck = max(centrality, key=centrality.get)
d = g.get_vertex(bottleneck)
print(f"\nBiggest bottleneck: {d['name']} (centrality {centrality[bottleneck]:.4f})")
PageRank (top 5):
RareEarthInc 0.2031
TSMC 0.1847
Foxconn 0.1423
SiliconMine 0.1102
Apple 0.0891
Biggest bottleneck: TSMC (centrality 0.6250)
RareEarthInc scores highest on PageRank because it is reachable from multiple independent upstream contracts and its "vote" propagates through both the chip fabrication and the EUV lease paths. TSMC wins on betweenness because every directed path from raw materials to a finished Apple product passes through it.
🧭 Directed path tracing
Given the directed structure, we can trace the exact multi-hop flow from raw lithium to a package on a delivery truck:
try:
path = g.find_shortest_path(lithium_co, dhl)
names = [g.get_vertex(v)["name"] for v in path]
print("Lithium → DHL:", " → ".join(names))
except NoPathError:
print("No directed path exists")
Lithium → DHL: LithiumCo → Panasonic → Apple → DHL
This only works because the hyperedges are directed. In HyperNetX, or in any undirected model, find_shortest_path(lithium_co, dhl) would find a path in either direction and tell you nothing about which way the goods actually flow. You'd have to encode direction separately, in a secondary data structure.
📋 Production ordering
Can we derive a valid build schedule — an ordering of all suppliers such that no company starts work before its upstream dependencies have delivered?
try:
order = g.topological_sort()
names = [g.get_vertex(v)["name"] for v in order]
print("Valid build order:")
print(" " + " → ".join(names))
except CycleDetectedError:
print("Circular dependency detected — check contracts for loops")
Valid build order:
LithiumCo → SiliconMine → RareEarthInc → ASML → Panasonic → SamsungSemi
→ TSMC → Panasonic → Foxconn → Apple → Dell → DHL
If a new contract introduced a cycle (a supplier that is also a downstream customer of one of its own customers), CycleDetectedError fires immediately — before the invalid schedule propagates to a production system.
🔗 Shared dependencies across contracts
Which companies appear in both the chip fabrication run and the EUV lithography lease? These are the suppliers whose disruption would cascade across multiple contracts simultaneously.
shared = g.get_intersections([chip_fab, euv_lease])
names = [g.get_vertex(v)["name"] for v in shared]
print(f"Shared across chip-fab and EUV lease: {names}")
Shared across chip-fab and EUV lease: ['RareEarthInc', 'TSMC']
RareEarthInc supplies raw inputs to both contracts. TSMC is the recipient in both. A single sourcing disruption at RareEarthInc would freeze not just chip fabrication but also ASML's ability to manufacture new EUV machines — a second-order effect that a pairwise edge model cannot represent directly.
🌍 Geopolitical risk filter
Filter the tier-1 suppliers located in regions flagged as geopolitically sensitive:
at_risk = (
g.vertices()
.where(lambda d: d["tier"] == 1 and d["country"] in {"Taiwan", "China", "South Korea"})
.data()
)
print("At-risk tier-1 suppliers:")
for d in at_risk:
print(f" {d['name']} ({d['country']})")
At-risk tier-1 suppliers:
TSMC (Taiwan)
SamsungSemi (South Korea)
The .where() lambda runs on the vertex payload dict — any field you stored at creation time is available. No separate attribute table to join.
🪞 Alternative perspective: the contract-centric dual
So far the graph is company-centric: vertices are companies, hyperedges are contracts. The dual swaps those roles — contracts become vertices, companies become hyperedges. This gives you a contract-centric view where two contracts are "adjacent" if they share a supplier.
dual = g.get_dual()
dual.build()
# In the dual, find which contracts are most central (share the most suppliers with others)
dual_centrality = dual.compute_centrality()
most_shared = max(dual_centrality, key=dual_centrality.get)
# The vertex IDs in the dual correspond to hyperedge IDs in the original
contract_data = g.get_hyperedge(most_shared)
print(f"Most interconnected contract: {contract_data['id']}")
Most interconnected contract: chip-fab-2024
The chip fabrication run is the contract most connected to others through shared suppliers — a disruption there would ripple outward through the largest number of adjacent contracts.
🔬 Isolate a sub-network for deeper analysis
Extract the Apple supply chain as a standalone hypergraph for team-specific analysis, without rebuilding the entire global graph:
apple_nodes = [lithium_co, rare_earth, silicon_mine, asml, tsmc, panasonic, apple, foxconn, dhl]
apple_chain = g.get_vertex_induced_subhypergraph(apple_nodes)
apple_chain.build()
# Run the same analyses on the isolated sub-graph
sub_scores, _, _ = apple_chain.compute_page_rank()
sub_ranked = sorted(sub_scores.items(), key=lambda x: x[1], reverse=True)
print("Apple chain — top supplier by PageRank:")
d = apple_chain.get_vertex(sub_ranked[0][0])
print(f" {d['name']} ({sub_ranked[0][1]:.4f})")
The sub-graph is a fully independent Hypergraph instance — save it to disk, hand it to another team, or run a separate set of mutations on it without touching the master graph.
🤖 Export for ML
Finally, export the incidence matrix in COO sparse format for downstream use in a hypergraph neural network or spectral clustering pipeline:
rows, cols, data_vals, (n_vertices, n_edges) = g.to_incidence_matrix_coo()
# Drop straight into scipy.sparse
import scipy.sparse as sp
H = sp.coo_matrix((data_vals, (rows, cols)), shape=(n_vertices, n_edges))
# Or into torch_geometric for hypergraph convolution
import torch
hyperedge_index = torch.tensor([rows, cols], dtype=torch.long)
The Laplacian follows the same pattern:
lap_rows, lap_cols, lap_vals, lap_shape = g.to_laplacian()
L = sp.coo_matrix((lap_vals, (lap_rows, lap_cols)), shape=lap_shape)
eigenvalues = sp.linalg.eigsh(L, k=3, which="SM", return_eigenvectors=False)
The smallest non-zero eigenvalue (Fiedler value) measures how well-connected the supply network is — a near-zero value signals that the network is close to splitting into two disconnected clusters.
💾 Persistence
Save the full graph to disk and reload it — the entire round-trip is handled by the Zig core:
import json
g.save("supply_chain.hgz",
lambda d: json.dumps(d).encode(),
lambda d: json.dumps(d).encode())
g2 = Hypergraph()
g2.load("supply_chain.hgz",
lambda b: json.loads(b),
lambda b: json.loads(b))
✨ Feature summary
| Category | What's included |
|---|---|
| Traversal | BFS, DFS, shortest path, all paths, random walk |
| Algorithms | PageRank, betweenness centrality, cut vertices, connected components, topological sort, transitive closure |
| Structural | Dual, k-skeleton, line graph, star expansion, clique expansion, k-core, (s,t)-core |
| Matrices | Incidence matrix (dense + COO sparse), normalized Laplacian |
| Queries | Neighborhood, intersections, inclusions, nestedness profile, degree, in/out-degree |
All on the same Hypergraph object — no conversion to a secondary representation.
🆚 Quick comparison
| NetworkX | HyperNetX | hypergraphz | |
|---|---|---|---|
| Real hyperedge semantics | workaround | yes | yes |
| Directed edges | yes (pairs only) | no | yes (hyperedges) |
| Native performance | no | no | yes (Zig core) |
| No compiler at install | yes | yes | yes |
| Full algorithm suite | graphs only | partial | yes |
| Typed exceptions | no | no | yes |
| Persistence | limited | no | yes |
| Bulk insertion API | n/a | no | yes |
| Hard dependencies | none | numpy, scipy, networkx | none |
| Cold import time | ~150 ms | ~400–600 ms | ~20 ms |
🚀 Getting started
pip install hypergraphz
# or
uv add hypergraphz
Pre-built wheels for Linux (x86_64, aarch64), macOS (x86_64, arm64), and Windows (x86_64). No compiler needed.
The full API reference covers every method shown above, and the source is on GitHub. If your domain has relationships that involve more than two parties at once, hypergraphz is the only Python library that models them natively — with direction.
Top comments (0)