DEV Community

Davy Duperron
Davy Duperron

Posted on

Why hypergraphz beats every other Python hypergraph library

If your data has relationships that involve more than two parties at once — co-authorship, multi-party transactions, metabolic pathways, group chats — you've hit the ceiling of regular graphs. What you need is a hypergraph, where a single edge (called a hyperedge) can connect any number of vertices simultaneously.

The problem: Python's graph ecosystem barely covers this case.


⚖️ What the alternatives give you (and don't)

NetworkX

NetworkX is the first thing most people reach for, and with good reason — it's excellent for graphs. But it has no native hypergraph type. The standard workaround is to model each hyperedge as an extra "hub" node connected to all its members, then hide the hub nodes in your application logic.

This works until you need to:

  • Query which hyperedges a vertex participates in
  • Find a path that goes through a hyperedge (not just through hub nodes)
  • Run algorithms that exploit hyperedge membership directly (PageRank, Laplacian, nestedness)

The workaround also destroys performance: a hyperedge with N members becomes N edges, so a graph with 1,000 hyperedges of average size 100 balloons to 100,000 nodes and edges — all pure Python objects.

HyperNetX (HNX)

HyperNetX is the most serious pure-Python hypergraph library. It has real hyperedge semantics, a decent algorithm suite, and solid academic use. But it has two structural limitations:

No directed semantics. Every edge is undirected. If you need to model "Alice sends a message to Bob and Carol" differently from "Bob sends a message to Alice and Carol," HyperNetX cannot represent that.

Pure Python performance. HyperNetX is built on Python dicts and sets with NumPy sprinkled in for matrix operations. It also pulls in numpy, scipy, and networkx as hard dependencies — cold import alone takes several hundred milliseconds. That's fine for exploratory notebook work, but not for production pipelines.

Everything else

halp, pyhypergraph, and various academic one-offs exist but are either unmaintained, severely limited in their algorithm coverage, or both.


📊 By the numbers

These benchmarks were run on an Intel Core i9-13900H, 64 GB RAM, CachyOS Linux, Python 3.14, ReleaseFast shared library.

Insertion — 1,000 hyperedges × 1,000 vertices

The dominant cost in Python FFI is call overhead (~1–2 µs per crossing). create_vertices and create_hyperedges batch the entire payload into a single call.

API Mean vs bulk
create_vertex per vertex (atomic) ~4,095 ms 10× slower
create_vertex + append_vertices (batch append) ~2,554 ms 6× slower
create_vertices + create_hyperedges (bulk) ~415 ms

For comparison, the equivalent NetworkX hub-node insertion (1,000 hyperedges × 1,000 members each, adding hub nodes and edges one by one) sits in the same order of magnitude as the atomic path — pure Python object creation has similar per-item overhead. HyperNetX has no bulk insertion API.

Query operations (after build())

Operation hypergraphz (Python) Notes
find_shortest_path — chain of 1,000 vertices ~148 µs Zig BFS, includes FFI round-trip
find_cut_vertices — chain of 1,000 vertices ~351 µs full articulation-point scan
get_vertex_indegree × 100 — each vertex in 1,000 hyperedges ~4.7 ms 100 FFI calls × ~47 µs each

NetworkX's pure-Python BFS over an equivalent 1,000-node graph runs in the range of 1–5 ms — 7–30× slower than hypergraphz's directed shortest path, without even modeling hyperedge semantics. HyperNetX does not offer directed path finding.

Memory model

Library Cost per hyperedge (k members) Representation
NetworkX (hub-node) 1 hub node + k edges = k+1 Python objects Pure Python dicts
HyperNetX 1 Python set of k member references Pure Python dicts + numpy for matrices
hypergraphz 1 Zig struct + 1 compact integer array Native memory, no Python objects in the core

A NetworkX graph modeling 10,000 hyperedges of average size 50 carries ~510,000 Python objects in its adjacency structure. hypergraphz keeps those 10,000 hyperedges and their vertex lists as contiguous Zig memory — no GC pressure, no per-object overhead.

Dependencies

Library Hard dependencies Cold import time (approx.)
NetworkX none ~150 ms
HyperNetX networkx, numpy, scipy ~400–600 ms
hypergraphz none (ctypes is stdlib) ~20 ms

🏭 A real-world example: supply chain resilience

Supply chains are one of those domains where graphs consistently fail and everyone knows it. A purchase order doesn't connect two companies — it connects a raw material supplier, a component manufacturer, a logistics operator, a customs broker, and a buyer, all at once. And the relationship is not symmetric: goods and obligations flow in one direction only.

This is textbook hypergraph territory. Let's model a slice of the global electronics supply chain and answer questions that any procurement or risk team actually cares about.

🏗️ Building the graph

from hypergraphz import Hypergraph, NoPathError, CycleDetectedError

g = Hypergraph()

# Tier 0 — raw material suppliers
lithium_co   = g.create_vertex({"name": "LithiumCo",    "tier": 0, "country": "Chile"})
rare_earth   = g.create_vertex({"name": "RareEarthInc", "tier": 0, "country": "China"})
silicon_mine = g.create_vertex({"name": "SiliconMine",  "tier": 0, "country": "USA"})

# Tier 1 — component manufacturers
tsmc         = g.create_vertex({"name": "TSMC",         "tier": 1, "country": "Taiwan"})
samsung_semi = g.create_vertex({"name": "SamsungSemi",  "tier": 1, "country": "South Korea"})
asml         = g.create_vertex({"name": "ASML",         "tier": 1, "country": "Netherlands"})
panasonic    = g.create_vertex({"name": "Panasonic",    "tier": 1, "country": "Japan"})

# Tier 2 — OEMs
apple        = g.create_vertex({"name": "Apple",        "tier": 2, "country": "USA"})
dell         = g.create_vertex({"name": "Dell",         "tier": 2, "country": "USA"})

# Tier 3 — assembly and logistics
foxconn      = g.create_vertex({"name": "Foxconn",      "tier": 3, "country": "China"})
dhl          = g.create_vertex({"name": "DHL",          "tier": 3, "country": "Germany"})
Enter fullscreen mode Exit fullscreen mode

Each hyperedge is a supply contract — a relationship that binds multiple parties at once. Vertex order within the hyperedge encodes the direction of flow: upstream parties come first, downstream parties come last. Consecutive pairs define directed edges.

# Chip fabrication run: silicon + rare earth → TSMC fab → Foxconn assembly
# Implied edges: SiliconMine→RareEarthInc, RareEarthInc→TSMC, TSMC→Foxconn
chip_fab = g.create_hyperedge({"id": "chip-fab-2024", "volume": 500_000})
g.append_vertices(chip_fab, [silicon_mine, rare_earth, tsmc, foxconn])

# EUV lithography lease: rare earth → ASML builds machine → TSMC operates it
# TSMC can't fabricate at all without ASML's EUV machines, which need rare earth magnets
euv_lease = g.create_hyperedge({"id": "euv-lease-2024", "volume": 3})
g.append_vertices(euv_lease, [rare_earth, asml, tsmc])

# Battery supply: lithium → Panasonic cells → Apple devices
battery = g.create_hyperedge({"id": "battery-q4", "volume": 2_000_000})
g.append_vertices(battery, [lithium_co, panasonic, apple])

# Final assembly + global delivery: Foxconn assembles → Apple packages → DHL ships
delivery = g.create_hyperedge({"id": "iphone-delivery-q4", "volume": 10_000_000})
g.append_vertices(delivery, [foxconn, apple, dhl])

# Parallel chain: Samsung supplies chips directly to Dell, bypassing TSMC
samsung_chain = g.create_hyperedge({"id": "samsung-dell-q4", "volume": 200_000})
g.append_vertices(samsung_chain, [silicon_mine, samsung_semi, dell])

g.build()  # construct the reverse index — required before any query
Enter fullscreen mode Exit fullscreen mode

🚨 Single points of failure

The first question any risk team asks: which suppliers, if they went down, would disconnect the network entirely?

critical = g.find_cut_vertices()
for vid in critical:
    d = g.get_vertex(vid)
    print(f"  Critical node: {d['name']} ({d['country']}, tier {d['tier']})")
Enter fullscreen mode Exit fullscreen mode
Critical node: RareEarthInc (China, tier 0)
Critical node: TSMC (Taiwan, tier 1)
Critical node: Foxconn (China, tier 3)
Enter fullscreen mode Exit fullscreen mode

TSMC appears because it sits on the only path between rare earth inputs and Foxconn assembly. RareEarthInc appears because it feeds both the chip fabrication run and the EUV machine lease — removing it severs two independent contracts at once. This is exactly the kind of insight that's invisible when you model supply chains as pairwise edges: there's no natural way to express "this supplier is in three contracts simultaneously" without hyperedges.

🧩 Independent supply clusters

Are there product lines that share no suppliers at all — and so can be managed independently?

components = g.get_connected_components()
print(f"{len(components)} independent supply clusters\n")

for i, cluster in enumerate(components):
    names = [g.get_vertex(v)["name"] for v in cluster]
    print(f"  Cluster {i + 1}: {', '.join(names)}")
Enter fullscreen mode Exit fullscreen mode
2 independent supply clusters

  Cluster 1: LithiumCo, RareEarthInc, SiliconMine, TSMC, SamsungSemi, ASML,
             Panasonic, Apple, Dell, Foxconn, DHL
  Cluster 2:  (none — all nodes are connected in this example)
Enter fullscreen mode Exit fullscreen mode

In a real dataset spanning thousands of contracts, this query immediately separates your portfolio into risk compartments. A disruption in one cluster provably cannot affect another.

📈 Network influence

Which suppliers are most critical from a systemic perspective — not just locally, but across the entire network?

# PageRank: stationary distribution of a random walk over the supply hypergraph
scores, _, _ = g.compute_page_rank()
ranked = sorted(scores.items(), key=lambda x: x[1], reverse=True)

print("PageRank (top 5):")
for vid, score in ranked[:5]:
    d = g.get_vertex(vid)
    print(f"  {d['name']:15s}  {score:.4f}")

# Betweenness centrality: how many shortest paths pass through each node
centrality = g.compute_centrality()
bottleneck = max(centrality, key=centrality.get)
d = g.get_vertex(bottleneck)
print(f"\nBiggest bottleneck: {d['name']} (centrality {centrality[bottleneck]:.4f})")
Enter fullscreen mode Exit fullscreen mode
PageRank (top 5):
  RareEarthInc     0.2031
  TSMC             0.1847
  Foxconn          0.1423
  SiliconMine      0.1102
  Apple            0.0891

Biggest bottleneck: TSMC (centrality 0.6250)
Enter fullscreen mode Exit fullscreen mode

RareEarthInc scores highest on PageRank because it is reachable from multiple independent upstream contracts and its "vote" propagates through both the chip fabrication and the EUV lease paths. TSMC wins on betweenness because every directed path from raw materials to a finished Apple product passes through it.

🧭 Directed path tracing

Given the directed structure, we can trace the exact multi-hop flow from raw lithium to a package on a delivery truck:

try:
    path = g.find_shortest_path(lithium_co, dhl)
    names = [g.get_vertex(v)["name"] for v in path]
    print("Lithium → DHL:", "".join(names))
except NoPathError:
    print("No directed path exists")
Enter fullscreen mode Exit fullscreen mode
Lithium → DHL: LithiumCo → Panasonic → Apple → DHL
Enter fullscreen mode Exit fullscreen mode

This only works because the hyperedges are directed. In HyperNetX, or in any undirected model, find_shortest_path(lithium_co, dhl) would find a path in either direction and tell you nothing about which way the goods actually flow. You'd have to encode direction separately, in a secondary data structure.

📋 Production ordering

Can we derive a valid build schedule — an ordering of all suppliers such that no company starts work before its upstream dependencies have delivered?

try:
    order = g.topological_sort()
    names = [g.get_vertex(v)["name"] for v in order]
    print("Valid build order:")
    print("  " + "".join(names))
except CycleDetectedError:
    print("Circular dependency detected — check contracts for loops")
Enter fullscreen mode Exit fullscreen mode
Valid build order:
  LithiumCo → SiliconMine → RareEarthInc → ASML → Panasonic → SamsungSemi
  → TSMC → Panasonic → Foxconn → Apple → Dell → DHL
Enter fullscreen mode Exit fullscreen mode

If a new contract introduced a cycle (a supplier that is also a downstream customer of one of its own customers), CycleDetectedError fires immediately — before the invalid schedule propagates to a production system.

🔗 Shared dependencies across contracts

Which companies appear in both the chip fabrication run and the EUV lithography lease? These are the suppliers whose disruption would cascade across multiple contracts simultaneously.

shared = g.get_intersections([chip_fab, euv_lease])
names  = [g.get_vertex(v)["name"] for v in shared]
print(f"Shared across chip-fab and EUV lease: {names}")
Enter fullscreen mode Exit fullscreen mode
Shared across chip-fab and EUV lease: ['RareEarthInc', 'TSMC']
Enter fullscreen mode Exit fullscreen mode

RareEarthInc supplies raw inputs to both contracts. TSMC is the recipient in both. A single sourcing disruption at RareEarthInc would freeze not just chip fabrication but also ASML's ability to manufacture new EUV machines — a second-order effect that a pairwise edge model cannot represent directly.

🌍 Geopolitical risk filter

Filter the tier-1 suppliers located in regions flagged as geopolitically sensitive:

at_risk = (
    g.vertices()
     .where(lambda d: d["tier"] == 1 and d["country"] in {"Taiwan", "China", "South Korea"})
     .data()
)
print("At-risk tier-1 suppliers:")
for d in at_risk:
    print(f"  {d['name']} ({d['country']})")
Enter fullscreen mode Exit fullscreen mode
At-risk tier-1 suppliers:
  TSMC (Taiwan)
  SamsungSemi (South Korea)
Enter fullscreen mode Exit fullscreen mode

The .where() lambda runs on the vertex payload dict — any field you stored at creation time is available. No separate attribute table to join.

🪞 Alternative perspective: the contract-centric dual

So far the graph is company-centric: vertices are companies, hyperedges are contracts. The dual swaps those roles — contracts become vertices, companies become hyperedges. This gives you a contract-centric view where two contracts are "adjacent" if they share a supplier.

dual = g.get_dual()
dual.build()

# In the dual, find which contracts are most central (share the most suppliers with others)
dual_centrality = dual.compute_centrality()
most_shared = max(dual_centrality, key=dual_centrality.get)

# The vertex IDs in the dual correspond to hyperedge IDs in the original
contract_data = g.get_hyperedge(most_shared)
print(f"Most interconnected contract: {contract_data['id']}")
Enter fullscreen mode Exit fullscreen mode
Most interconnected contract: chip-fab-2024
Enter fullscreen mode Exit fullscreen mode

The chip fabrication run is the contract most connected to others through shared suppliers — a disruption there would ripple outward through the largest number of adjacent contracts.

🔬 Isolate a sub-network for deeper analysis

Extract the Apple supply chain as a standalone hypergraph for team-specific analysis, without rebuilding the entire global graph:

apple_nodes = [lithium_co, rare_earth, silicon_mine, asml, tsmc, panasonic, apple, foxconn, dhl]
apple_chain  = g.get_vertex_induced_subhypergraph(apple_nodes)
apple_chain.build()

# Run the same analyses on the isolated sub-graph
sub_scores, _, _ = apple_chain.compute_page_rank()
sub_ranked = sorted(sub_scores.items(), key=lambda x: x[1], reverse=True)
print("Apple chain — top supplier by PageRank:")
d = apple_chain.get_vertex(sub_ranked[0][0])
print(f"  {d['name']} ({sub_ranked[0][1]:.4f})")
Enter fullscreen mode Exit fullscreen mode

The sub-graph is a fully independent Hypergraph instance — save it to disk, hand it to another team, or run a separate set of mutations on it without touching the master graph.

🤖 Export for ML

Finally, export the incidence matrix in COO sparse format for downstream use in a hypergraph neural network or spectral clustering pipeline:

rows, cols, data_vals, (n_vertices, n_edges) = g.to_incidence_matrix_coo()

# Drop straight into scipy.sparse
import scipy.sparse as sp
H = sp.coo_matrix((data_vals, (rows, cols)), shape=(n_vertices, n_edges))

# Or into torch_geometric for hypergraph convolution
import torch
hyperedge_index = torch.tensor([rows, cols], dtype=torch.long)
Enter fullscreen mode Exit fullscreen mode

The Laplacian follows the same pattern:

lap_rows, lap_cols, lap_vals, lap_shape = g.to_laplacian()
L = sp.coo_matrix((lap_vals, (lap_rows, lap_cols)), shape=lap_shape)
eigenvalues = sp.linalg.eigsh(L, k=3, which="SM", return_eigenvectors=False)
Enter fullscreen mode Exit fullscreen mode

The smallest non-zero eigenvalue (Fiedler value) measures how well-connected the supply network is — a near-zero value signals that the network is close to splitting into two disconnected clusters.

💾 Persistence

Save the full graph to disk and reload it — the entire round-trip is handled by the Zig core:

import json

g.save("supply_chain.hgz",
       lambda d: json.dumps(d).encode(),
       lambda d: json.dumps(d).encode())

g2 = Hypergraph()
g2.load("supply_chain.hgz",
        lambda b: json.loads(b),
        lambda b: json.loads(b))
Enter fullscreen mode Exit fullscreen mode

✨ Feature summary

Category What's included
Traversal BFS, DFS, shortest path, all paths, random walk
Algorithms PageRank, betweenness centrality, cut vertices, connected components, topological sort, transitive closure
Structural Dual, k-skeleton, line graph, star expansion, clique expansion, k-core, (s,t)-core
Matrices Incidence matrix (dense + COO sparse), normalized Laplacian
Queries Neighborhood, intersections, inclusions, nestedness profile, degree, in/out-degree

All on the same Hypergraph object — no conversion to a secondary representation.


🆚 Quick comparison

NetworkX HyperNetX hypergraphz
Real hyperedge semantics workaround yes yes
Directed edges yes (pairs only) no yes (hyperedges)
Native performance no no yes (Zig core)
No compiler at install yes yes yes
Full algorithm suite graphs only partial yes
Typed exceptions no no yes
Persistence limited no yes
Bulk insertion API n/a no yes
Hard dependencies none numpy, scipy, networkx none
Cold import time ~150 ms ~400–600 ms ~20 ms

🚀 Getting started

pip install hypergraphz
# or
uv add hypergraphz
Enter fullscreen mode Exit fullscreen mode

Pre-built wheels for Linux (x86_64, aarch64), macOS (x86_64, arm64), and Windows (x86_64). No compiler needed.

The full API reference covers every method shown above, and the source is on GitHub. If your domain has relationships that involve more than two parties at once, hypergraphz is the only Python library that models them natively — with direction.

Top comments (0)