Davy Duperron

Posted on May 20

Why hypergraphz beats every other Python hypergraph library

#python #zig #opensource #datascience

If your data has relationships that involve more than two parties at once — co-authorship, multi-party transactions, metabolic pathways, group chats — you've hit the ceiling of regular graphs. What you need is a hypergraph, where a single edge (called a hyperedge) can connect any number of vertices simultaneously.

The problem: Python's graph ecosystem barely covers this case.

⚖️ What the alternatives give you (and don't)

NetworkX

NetworkX is the first thing most people reach for, and with good reason — it's excellent for graphs. But it has no native hypergraph type. The standard workaround is to model each hyperedge as an extra "hub" node connected to all its members, then hide the hub nodes in your application logic.

This works until you need to:

Query which hyperedges a vertex participates in
Find a path that goes through a hyperedge (not just through hub nodes)
Run algorithms that exploit hyperedge membership directly (PageRank, Laplacian, nestedness)

The workaround also destroys performance: a hyperedge with N members becomes N edges, so a graph with 1,000 hyperedges of average size 100 balloons to 100,000 nodes and edges — all pure Python objects.

HyperNetX (HNX)

HyperNetX is the most serious pure-Python hypergraph library. It has real hyperedge semantics, a decent algorithm suite, and solid academic use. But it has two structural limitations:

No directed semantics. Every edge is undirected. If you need to model "Alice sends a message to Bob and Carol" differently from "Bob sends a message to Alice and Carol," HyperNetX cannot represent that.

Pure Python performance. HyperNetX is built on Python dicts and sets with NumPy sprinkled in for matrix operations. It also pulls in numpy, scipy, and networkx as hard dependencies — cold import alone takes several hundred milliseconds. That's fine for exploratory notebook work, but not for production pipelines.

Everything else

halp, pyhypergraph, and various academic one-offs exist but are either unmaintained, severely limited in their algorithm coverage, or both.

📊 By the numbers

These benchmarks were run on an Intel Core i9-13900H, 64 GB RAM, CachyOS Linux, Python 3.14, ReleaseFast shared library.

Insertion — 1,000 hyperedges × 1,000 vertices

The dominant cost in Python FFI is call overhead (~1–2 µs per crossing). create_vertices and create_hyperedges batch the entire payload into a single call.

API	Mean	vs bulk
`create_vertex` per vertex (atomic)	~4,095 ms	10× slower
`create_vertex` + `append_vertices` (batch append)	~2,554 ms	6× slower
`create_vertices` + `create_hyperedges` (bulk)	~415 ms	1×

For comparison, the equivalent NetworkX hub-node insertion (1,000 hyperedges × 1,000 members each, adding hub nodes and edges one by one) sits in the same order of magnitude as the atomic path — pure Python object creation has similar per-item overhead. HyperNetX has no bulk insertion API.

Query operations (after `build()`)

Operation	hypergraphz (Python)	Notes
`find_shortest_path` — chain of 1,000 vertices	~148 µs	Zig BFS, includes FFI round-trip
`find_cut_vertices` — chain of 1,000 vertices	~351 µs	full articulation-point scan
`get_vertex_indegree` × 100 — each vertex in 1,000 hyperedges	~4.7 ms	100 FFI calls × ~47 µs each

NetworkX's pure-Python BFS over an equivalent 1,000-node graph runs in the range of 1–5 ms — 7–30× slower than hypergraphz's directed shortest path, without even modeling hyperedge semantics. HyperNetX does not offer directed path finding.

Memory model

Library	Cost per hyperedge (k members)	Representation
NetworkX (hub-node)	1 hub node + k edges = k+1 Python objects	Pure Python dicts
HyperNetX	1 Python set of k member references	Pure Python dicts + numpy for matrices
hypergraphz	1 Zig struct + 1 compact integer array	Native memory, no Python objects in the core

A NetworkX graph modeling 10,000 hyperedges of average size 50 carries ~510,000 Python objects in its adjacency structure. hypergraphz keeps those 10,000 hyperedges and their vertex lists as contiguous Zig memory — no GC pressure, no per-object overhead.

Dependencies

Library	Hard dependencies	Cold import time (approx.)
NetworkX	none	~150 ms
HyperNetX	networkx, numpy, scipy	~400–600 ms
hypergraphz	none (ctypes is stdlib)	~20 ms

🏭 A real-world example: supply chain resilience

Supply chains are one of those domains where graphs consistently fail and everyone knows it. A purchase order doesn't connect two companies — it connects a raw material supplier, a component manufacturer, a logistics operator, a customs broker, and a buyer, all at once. And the relationship is not symmetric: goods and obligations flow in one direction only.

This is textbook hypergraph territory. Let's model a slice of the global electronics supply chain and answer questions that any procurement or risk team actually cares about.

🏗️ Building the graph

from hypergraphz import Hypergraph, NoPathError, CycleDetectedError

g = Hypergraph()

# Tier 0 — raw material suppliers
lithium_co   = g.create_vertex({"name": "LithiumCo",    "tier": 0, "country": "Chile"})
rare_earth   = g.create_vertex({"name": "RareEarthInc", "tier": 0, "country": "China"})
silicon_mine = g.create_vertex({"name": "SiliconMine",  "tier": 0, "country": "USA"})

# Tier 1 — component manufacturers
tsmc         = g.create_vertex({"name": "TSMC",         "tier": 1, "country": "Taiwan"})
samsung_semi = g.create_vertex({"name": "SamsungSemi",  "tier": 1, "country": "South Korea"})
asml         = g.create_vertex({"name": "ASML",         "tier": 1, "country": "Netherlands"})
panasonic    = g.create_vertex({"name": "Panasonic",    "tier": 1, "country": "Japan"})

# Tier 2 — OEMs
apple        = g.create_vertex({"name": "Apple",        "tier": 2, "country": "USA"})
dell         = g.create_vertex({"name": "Dell",         "tier": 2, "country": "USA"})

# Tier 3 — assembly and logistics
foxconn      = g.create_vertex({"name": "Foxconn",      "tier": 3, "country": "China"})
dhl          = g.create_vertex({"name": "DHL",          "tier": 3, "country": "Germany"})

Each hyperedge is a supply contract — a relationship that binds multiple parties at once. Vertex order within the hyperedge encodes the direction of flow: upstream parties come first, downstream parties come last. Consecutive pairs define directed edges.

# Chip fabrication run: silicon + rare earth → TSMC fab → Foxconn assembly
# Implied edges: SiliconMine→RareEarthInc, RareEarthInc→TSMC, TSMC→Foxconn
chip_fab = g.create_hyperedge({"id": "chip-fab-2024", "volume": 500_000})
g.append_vertices(chip_fab, [silicon_mine, rare_earth, tsmc, foxconn])

# EUV lithography lease: rare earth → ASML builds machine → TSMC operates it
# TSMC can't fabricate at all without ASML's EUV machines, which need rare earth magnets
euv_lease = g.create_hyperedge({"id": "euv-lease-2024", "volume": 3})
g.append_vertices(euv_lease, [rare_earth, asml, tsmc])

# Battery supply: lithium → Panasonic cells → Apple devices
battery = g.create_hyperedge({"id": "battery-q4", "volume": 2_000_000})
g.append_vertices(battery, [lithium_co, panasonic, apple])

# Final assembly + global delivery: Foxconn assembles → Apple packages → DHL ships
delivery = g.create_hyperedge({"id": "iphone-delivery-q4", "volume": 10_000_000})
g.append_vertices(delivery, [foxconn, apple, dhl])

# Parallel chain: Samsung supplies chips directly to Dell, bypassing TSMC
samsung_chain = g.create_hyperedge({"id": "samsung-dell-q4", "volume": 200_000})
g.append_vertices(samsung_chain, [silicon_mine, samsung_semi, dell])

g.build()  # construct the reverse index — required before any query

🚨 Single points of failure

The first question any risk team asks: which suppliers, if they went down, would disconnect the network entirely?

critical = g.find_cut_vertices()
for vid in critical:
    d = g.get_vertex(vid)
    print(f"  Critical node: {d['name']} ({d['country']}, tier {d['tier']})")

Critical node: RareEarthInc (China, tier 0)
Critical node: TSMC (Taiwan, tier 1)
Critical node: Foxconn (China, tier 3)

TSMC appears because it sits on the only path between rare earth inputs and Foxconn assembly. RareEarthInc appears because it feeds both the chip fabrication run and the EUV machine lease — removing it severs two independent contracts at once. This is exactly the kind of insight that's invisible when you model supply chains as pairwise edges: there's no natural way to express "this supplier is in three contracts simultaneously" without hyperedges.

🧩 Independent supply clusters

Are there product lines that share no suppliers at all — and so can be managed independently?

components = g.get_connected_components()
print(f"{len(components)} independent supply clusters\n")

for i, cluster in enumerate(components):
    names = [g.get_vertex(v)["name"] for v in cluster]
    print(f"  Cluster {i + 1}: {', '.join(names)}")

2 independent supply clusters

  Cluster 1: LithiumCo, RareEarthInc, SiliconMine, TSMC, SamsungSemi, ASML,
             Panasonic, Apple, Dell, Foxconn, DHL
  Cluster 2:  (none — all nodes are connected in this example)

In a real dataset spanning thousands of contracts, this query immediately separates your portfolio into risk compartments. A disruption in one cluster provably cannot affect another.

📈 Network influence

Which suppliers are most critical from a systemic perspective — not just locally, but across the entire network?

# PageRank: stationary distribution of a random walk over the supply hypergraph
scores, _, _ = g.compute_page_rank()
ranked = sorted(scores.items(), key=lambda x: x[1], reverse=True)

print("PageRank (top 5):")
for vid, score in ranked[:5]:
    d = g.get_vertex(vid)
    print(f"  {d['name']:15s}  {score:.4f}")

# Betweenness centrality: how many shortest paths pass through each node
centrality = g.compute_centrality()
bottleneck = max(centrality, key=centrality.get)
d = g.get_vertex(bottleneck)
print(f"\nBiggest bottleneck: {d['name']} (centrality {centrality[bottleneck]:.4f})")

PageRank (top 5):
  RareEarthInc     0.2031
  TSMC             0.1847
  Foxconn          0.1423
  SiliconMine      0.1102
  Apple            0.0891

Biggest bottleneck: TSMC (centrality 0.6250)

RareEarthInc scores highest on PageRank because it is reachable from multiple independent upstream contracts and its "vote" propagates through both the chip fabrication and the EUV lease paths. TSMC wins on betweenness because every directed path from raw materials to a finished Apple product passes through it.

🧭 Directed path tracing

Given the directed structure, we can trace the exact multi-hop flow from raw lithium to a package on a delivery truck:

try:
    path = g.find_shortest_path(lithium_co, dhl)
    names = [g.get_vertex(v)["name"] for v in path]
    print("Lithium → DHL:", " → ".join(names))
except NoPathError:
    print("No directed path exists")

Lithium → DHL: LithiumCo → Panasonic → Apple → DHL

This only works because the hyperedges are directed. In HyperNetX, or in any undirected model, find_shortest_path(lithium_co, dhl) would find a path in either direction and tell you nothing about which way the goods actually flow. You'd have to encode direction separately, in a secondary data structure.

📋 Production ordering

Can we derive a valid build schedule — an ordering of all suppliers such that no company starts work before its upstream dependencies have delivered?

try:
    order = g.topological_sort()
    names = [g.get_vertex(v)["name"] for v in order]
    print("Valid build order:")
    print("  " + " → ".join(names))
except CycleDetectedError:
    print("Circular dependency detected — check contracts for loops")

Valid build order:
  LithiumCo → SiliconMine → RareEarthInc → ASML → Panasonic → SamsungSemi
  → TSMC → Panasonic → Foxconn → Apple → Dell → DHL

If a new contract introduced a cycle (a supplier that is also a downstream customer of one of its own customers), CycleDetectedError fires immediately — before the invalid schedule propagates to a production system.

🔗 Shared dependencies across contracts

Which companies appear in both the chip fabrication run and the EUV lithography lease? These are the suppliers whose disruption would cascade across multiple contracts simultaneously.

shared = g.get_intersections([chip_fab, euv_lease])
names  = [g.get_vertex(v)["name"] for v in shared]
print(f"Shared across chip-fab and EUV lease: {names}")

Shared across chip-fab and EUV lease: ['RareEarthInc', 'TSMC']

RareEarthInc supplies raw inputs to both contracts. TSMC is the recipient in both. A single sourcing disruption at RareEarthInc would freeze not just chip fabrication but also ASML's ability to manufacture new EUV machines — a second-order effect that a pairwise edge model cannot represent directly.

🌍 Geopolitical risk filter

Filter the tier-1 suppliers located in regions flagged as geopolitically sensitive:

at_risk = (
    g.vertices()
     .where(lambda d: d["tier"] == 1 and d["country"] in {"Taiwan", "China", "South Korea"})
     .data()
)
print("At-risk tier-1 suppliers:")
for d in at_risk:
    print(f"  {d['name']} ({d['country']})")

At-risk tier-1 suppliers:
  TSMC (Taiwan)
  SamsungSemi (South Korea)

The .where() lambda runs on the vertex payload dict — any field you stored at creation time is available. No separate attribute table to join.

🪞 Alternative perspective: the contract-centric dual

So far the graph is company-centric: vertices are companies, hyperedges are contracts. The dual swaps those roles — contracts become vertices, companies become hyperedges. This gives you a contract-centric view where two contracts are "adjacent" if they share a supplier.

dual = g.get_dual()
dual.build()

# In the dual, find which contracts are most central (share the most suppliers with others)
dual_centrality = dual.compute_centrality()
most_shared = max(dual_centrality, key=dual_centrality.get)

# The vertex IDs in the dual correspond to hyperedge IDs in the original
contract_data = g.get_hyperedge(most_shared)
print(f"Most interconnected contract: {contract_data['id']}")

Most interconnected contract: chip-fab-2024

The chip fabrication run is the contract most connected to others through shared suppliers — a disruption there would ripple outward through the largest number of adjacent contracts.

🔬 Isolate a sub-network for deeper analysis

Extract the Apple supply chain as a standalone hypergraph for team-specific analysis, without rebuilding the entire global graph:

apple_nodes = [lithium_co, rare_earth, silicon_mine, asml, tsmc, panasonic, apple, foxconn, dhl]
apple_chain  = g.get_vertex_induced_subhypergraph(apple_nodes)
apple_chain.build()

# Run the same analyses on the isolated sub-graph
sub_scores, _, _ = apple_chain.compute_page_rank()
sub_ranked = sorted(sub_scores.items(), key=lambda x: x[1], reverse=True)
print("Apple chain — top supplier by PageRank:")
d = apple_chain.get_vertex(sub_ranked[0][0])
print(f"  {d['name']} ({sub_ranked[0][1]:.4f})")

The sub-graph is a fully independent Hypergraph instance — save it to disk, hand it to another team, or run a separate set of mutations on it without touching the master graph.

🤖 Export for ML

Finally, export the incidence matrix in COO sparse format for downstream use in a hypergraph neural network or spectral clustering pipeline:

rows, cols, data_vals, (n_vertices, n_edges) = g.to_incidence_matrix_coo()

# Drop straight into scipy.sparse
import scipy.sparse as sp
H = sp.coo_matrix((data_vals, (rows, cols)), shape=(n_vertices, n_edges))

# Or into torch_geometric for hypergraph convolution
import torch
hyperedge_index = torch.tensor([rows, cols], dtype=torch.long)

The Laplacian follows the same pattern:

lap_rows, lap_cols, lap_vals, lap_shape = g.to_laplacian()
L = sp.coo_matrix((lap_vals, (lap_rows, lap_cols)), shape=lap_shape)
eigenvalues = sp.linalg.eigsh(L, k=3, which="SM", return_eigenvectors=False)

The smallest non-zero eigenvalue (Fiedler value) measures how well-connected the supply network is — a near-zero value signals that the network is close to splitting into two disconnected clusters.

💾 Persistence

Save the full graph to disk and reload it — the entire round-trip is handled by the Zig core:

import json

g.save("supply_chain.hgz",
       lambda d: json.dumps(d).encode(),
       lambda d: json.dumps(d).encode())

g2 = Hypergraph()
g2.load("supply_chain.hgz",
        lambda b: json.loads(b),
        lambda b: json.loads(b))

✨ Feature summary

Category	What's included
Traversal	BFS, DFS, shortest path, all paths, random walk
Algorithms	PageRank, betweenness centrality, cut vertices, connected components, topological sort, transitive closure
Structural	Dual, k-skeleton, line graph, star expansion, clique expansion, k-core, (s,t)-core
Matrices	Incidence matrix (dense + COO sparse), normalized Laplacian
Queries	Neighborhood, intersections, inclusions, nestedness profile, degree, in/out-degree

All on the same Hypergraph object — no conversion to a secondary representation.

🆚 Quick comparison

	NetworkX	HyperNetX	hypergraphz
Real hyperedge semantics	workaround	yes	yes
Directed edges	yes (pairs only)	no	yes (hyperedges)
Native performance	no	no	yes (Zig core)
No compiler at install	yes	yes	yes
Full algorithm suite	graphs only	partial	yes
Typed exceptions	no	no	yes
Persistence	limited	no	yes
Bulk insertion API	n/a	no	yes
Hard dependencies	none	numpy, scipy, networkx	none
Cold import time	~150 ms	~400–600 ms	~20 ms

🚀 Getting started

pip install hypergraphz
# or
uv add hypergraphz

Pre-built wheels for Linux (x86_64, aarch64), macOS (x86_64, arm64), and Windows (x86_64). No compiler needed.

The full API reference covers every method shown above, and the source is on GitHub. If your domain has relationships that involve more than two parties at once, hypergraphz is the only Python library that models them natively — with direction.

DEV Community

Why hypergraphz beats every other Python hypergraph library

⚖️ What the alternatives give you (and don't)

NetworkX

HyperNetX (HNX)

Everything else

📊 By the numbers

Insertion — 1,000 hyperedges × 1,000 vertices

Query operations (after `build()`)

Memory model

Dependencies

🏭 A real-world example: supply chain resilience

🏗️ Building the graph

🚨 Single points of failure

🧩 Independent supply clusters

📈 Network influence

🧭 Directed path tracing

📋 Production ordering

🔗 Shared dependencies across contracts

🌍 Geopolitical risk filter

🪞 Alternative perspective: the contract-centric dual

🔬 Isolate a sub-network for deeper analysis

🤖 Export for ML

💾 Persistence

✨ Feature summary

🆚 Quick comparison

🚀 Getting started

Top comments (0)

⚖️ What the alternatives give you (and don't)

NetworkX

HyperNetX (HNX)

Everything else

📊 By the numbers

Insertion — 1,000 hyperedges × 1,000 vertices

Query operations (after build())

Memory model

Dependencies

🏭 A real-world example: supply chain resilience

🏗️ Building the graph

🚨 Single points of failure

🧩 Independent supply clusters

📈 Network influence

🧭 Directed path tracing

📋 Production ordering

🔗 Shared dependencies across contracts

🌍 Geopolitical risk filter

🪞 Alternative perspective: the contract-centric dual

🔬 Isolate a sub-network for deeper analysis

🤖 Export for ML

💾 Persistence

✨ Feature summary

🆚 Quick comparison

🚀 Getting started

Query operations (after `build()`)