ZigRazor

Posted on Mar 24

CXXGraph: The Header-Only C++ Graph Library You Should Know About

#cpp #algorithms #graphtheory #opensource

I recently spent a week studying the CXXGraph codebase top-to-bottom. What I found is a library that makes consistently good architectural decisions — and a few that are worth talking about openly. This post is for the C++ developer who either needs a graph library today, or who wants to learn how a production-quality header-only library is structured.

Why another graph library?

The usual answer to "I need graphs in C++" is Boost.Graph. If you have never opened the Boost.Graph documentation, congratulations — you have lived a fuller life than I have. The abstraction model is powerful, correct, and almost completely impenetrable to anyone who did not write it. Concepts, property maps, bundled properties, vertex and edge descriptors: by the time you understand the machinery you could have written the algorithm yourself.

CXXGraph takes the opposite bet: developer experience first, zero external dependencies, modern C++17 throughout. You drop a single #include "CXXGraph/CXXGraph.hpp" into your project and you are immediately writing graph code rather than template metaprogramming. The library is genuinely header-only — there is no build step, no CMake target to link against (though it ships CMakeLists.txt for the test and benchmark suite). For embedding in an existing project, that matters enormously.

How the core model is laid out

The ownership model is the first thing worth understanding, because it shapes every API decision downstream.

Every node and edge is heap-allocated and owned by a std::shared_ptr. The graph itself holds an std::unordered_set<shared<const Edge<T>>, edgeHash<T>> and does not own nodes directly — nodes are derived from the edge set at query time by traversing the cached adjacency lists. Isolated nodes (nodes added explicitly with addNode()) live in a separate isolatedNodesSet.

CXXGraph::Node<int> nodeA("A", 42);
CXXGraph::Node<int> nodeB("B", 17);

auto edge = std::make_shared<CXXGraph::DirectedWeightedEdge<int>>(
    "e1", nodeA, nodeB, 3.5
);

CXXGraph::T_EdgeSet<int> edgeSet;
edgeSet.insert(edge);
CXXGraph::Graph<int> g(edgeSet);

The Node<T> constructor takes a user-supplied string userId and hashes it via std::hash<std::string> to produce a stable numeric id_t. This means node equality is based on hash equality, not pointer equality — an important distinction if you ever construct nodes independently and wonder why operator== says they are the same node. The hashing is documented but easy to miss.

The edge hierarchy is clean:

Edge<T>
├── DirectedEdge<T>
│   └── DirectedWeightedEdge<T>   (also inherits Weighted)
└── UndirectedEdge<T>
    └── UndirectedWeightedEdge<T> (also inherits Weighted)

Weighted is a standalone non-template mixin carrying a single double weight. The isDirected() and isWeighted() virtual methods return std::optional<bool> — std::nullopt for the base Edge<T>, which is treated as undirected throughout the codebase. That nullopt-means-undirected convention is a small footgun, but it is at least consistent.

Adjacency lists: the caching strategy

The design choice that most affects performance is the caching layer. Rather than computing adjacency lists on every algorithm call, the graph maintains two pre-built lists:

shared<AdjacencyList<T>> cachedAdjListOut;
shared<AdjacencyList<T>> cachedAdjListIn;

where AdjacencyList<T> expands to:

std::unordered_map<
    shared<const Node<T>>,
    std::vector<std::pair<shared<const Node<T>>, shared<const Edge<T>>>>,
    nodeHash<T>
>

These caches are rebuilt eagerly on construction via invalidateCache(). Every addEdge() and removeEdge() mutates the caches incrementally rather than invalidating them entirely — a good trade-off given that most graphs are built once and queried many times.

The algorithms access cachedAdjListOut directly as a member via this->cachedAdjListOut, bypassing the getter. This is a conscious performance choice: the getter getAdjListOut() reconstructs a fresh list from scratch every call, while the cached member is the live structure. The difference matters significantly on large graphs.

One architectural observation: Graph<T> is designed for single-threaded use. The ThreadSafeGraph<T> subclass wraps every public method with either a std::shared_lock (reads) or std::unique_lock (writes) over a std::shared_mutex. The design is correct but conservative — every algorithm call acquires a full shared lock for its duration, meaning concurrent reads serialize on any write even if the write touches a different partition of the graph.

Algorithm implementations: a tour worth taking

Each algorithm lives in its own header file under include/CXXGraph/Graph/Algorithm/. This is the right call. It keeps the main Graph_decl.h readable, it makes it trivially easy to audit a single algorithm, and it allows a future contributor to add an algorithm without touching anything that already works.

Dijkstra has three variants

The standard dijkstra() uses a std::priority_queue<pair<double, Node>> with std::greater (min-heap). The dijkstra_deterministic() variant adds a tiebreaker using accumulated node-id sums, ensuring the same path is returned regardless of hash-map iteration order. The dijkstra_deterministic2() goes further, using full lexicographic path comparison for complete determinism even when two paths have equal distance and equal id-sum. Each is progressively more expensive in both time and space. The right choice for testing is _deterministic2; the right choice for production is the plain variant.

One subtle bug worth noting: in dijkstra_deterministic(), the condition for handling undirected weighted edges reads:

} else if (!elem.second->isDirected() == false) {

This is logically equivalent to elem.second->isDirected() == true, which is the directed branch — the undirected branch is dead code. The same pattern appears in dijkstra_deterministic2(). The standard dijkstra() and criticalpath_deterministic() use the correct elem.second->isDirected() == false. This does not affect correctness for graphs that are either all-directed or all-undirected (the common case), but it is a latent bug in mixed graphs.

Tarjan does five things

The tarjan() implementation accepts a bitmask:

unsigned int typeMask = TARJAN_FIND_SCC | TARJAN_FIND_CUTV;
auto result = graph.tarjan(typeMask);

The five flags are TARJAN_FIND_SCC, TARJAN_FIND_CUTV, TARJAN_FIND_BRIDGE, TARJAN_FIND_VBCC, and TARJAN_FIND_EBCC. The directed-graph flags and undirected-graph flags are validated at the entry point — requesting SCC on an undirected graph returns an error immediately. The implementation uses a single DFS that computes all requested results in one pass, which is efficient, but the shared-stack semantics mean you cannot mix directed and undirected flags even if you wanted to.

The SCC stack (sccNodeStack) uses std::stack<Node<T>> — note that it stores copies, not pointers. For large nodes this could be a hidden memory cost, but for the common int or string data type it is fine.

Bellman-Ford has early stopping

The implementation adds a currentDist snapshot per iteration and checks whether any distance changed. If nothing changed, it exits early. This is a minor but meaningful optimization for sparse graphs with many disconnected components. The parallel variant (bellmanford_parallel()) uses std::atomic<double> per vertex with a compare-and-swap loop — correct under concurrent writes, though the CAS hot-loop on contested vertices has the usual contention characteristics.

The parallel layer is backend-agnostic

The Parallel.hpp utility is a clean example of how to write portable parallel code. The priority order is: OpenMP if -DCXXGRAPH_WITH_OPENMP is set, then PSTL (std::execution::par_unseq) if __cpp_lib_parallel_algorithm >= 201603L, then sequential fallback. Apple Clang on macOS without TBB does not define that feature macro — it falls back to sequential rather than issuing a compile error. This is the correct behaviour and it is explicitly documented in the code.

Parallel::parallel_for(std::size_t{0}, V, [&](std::size_t src) {
    // Floyd-Warshall inner loop — parallel across source rows
});

The parallel_sort wrapper deliberately uses sequential std::sort in the OpenMP path because OpenMP does not provide a sort primitive. The comment explains this honestly rather than silently degrading.

Hypergraph: the newest addition

Hypergraph<T> was added in early 2026. A hyperedge connects an arbitrary number of nodes rather than exactly two:

CXXGraph::Node<int> a("a", 1), b("b", 2), c("c", 3);
auto he = std::make_shared<CXXGraph::HyperEdge<int>>(
    "h1",
    std::vector{
        std::make_shared<const CXXGraph::Node<int>>(a),
        std::make_shared<const CXXGraph::Node<int>>(b),
        std::make_shared<const CXXGraph::Node<int>>(c)
    }
);

CXXGraph::Hypergraph<int> hg;
hg.addHyperEdge(he);

The implementation supports BFS and DFS (with the hypergraph expansion rule: from a node, visit all incident hyperedges, then enqueue all unvisited members of those hyperedges), union-find connected components, the dual hypergraph H*, the 2-section projection (ordinary graph adjacency, one edge per pair of nodes that co-appear in any hyperedge), and the |V|×|E| binary incidence matrix.

The removeNode() implementation is worth reading — it rebuilds affected hyperedges by allocating new HyperEdge objects without the removed node, which correctly handles the immutability of the edge after construction.

Partitioning: for distributed graph processing

The partitioning layer implements vertex-cut strategies for distributing edges across P partitions, the kind of primitive you need for distributed graph processing systems. Five algorithms are provided:

HDRF (High-Degree Replicated First) — the streaming vertex-cut baseline from the 2015 Petroni et al. paper
EdgeBalancedVertexCut — assign each edge to the least-loaded partition
GreedyVertexCut — intersection-first, then union, then global minimum
EBV — the offline algorithm from the 2020 arXiv paper
WeightBalancedLibra — weight-aware placement from the 2020 paper

All strategies share a PartitionStrategy<T> interface with a single performStep(edge, state) method. Thread safety is handled at the record level with std::mutex-per-node locking and exponential-backoff deadlock avoidance. The CoordinatedPartitionState is shared across worker threads with fine-grained per-vector mutex protection.

Things I would do differently

Studying a codebase this carefully means forming opinions about it.

The getNodeSet() cost is invisible. The method constructs a fresh std::unordered_set by iterating both adjacency lists on every call. Every algorithm that calls getNodeSet() in a loop — and several do — pays this repeatedly. A lazy-cached version would help, though invalidating it correctly on mutation adds complexity.

Error reporting is string-based. Every result struct has a std::string errorMessage. For a library used in hot paths this is fine; for a library used in embedded systems it is less ideal. A strongly-typed error enum would make caller code more robust and would catch unreachable error cases at compile time.

The dijkstra_deterministic dead-code bug (described above) should be caught by static analysis. The CI already runs clang-tidy but presumably this condition slips through. A targeted test with a mixed directed/undirected weighted graph would expose it immediately.

Graph<T>::getNodeSet() returns by value. For small graphs this is fine. For graphs with millions of nodes, callers who only need to iterate should prefer the private nodeSet() (non-const, returns mutable shared pointers) or a future range-based view.

When to use it

Use CXXGraph when:

You need graph algorithms and do not want Boost as a dependency.
You are prototyping and want readable code you can step through in a debugger without descending into Boost's property map machinery.
You need graph partitioning for a distributed system and want a reference implementation of HDRF or EBV.
You are building a research tool and need Tarjan's algorithm to return bridges, cut vertices, SCCs, VBCCs, and EBCCs — possibly in one pass.
You need a hypergraph and do not want to write one yourself.

Do not use CXXGraph when:

You need performance at the absolute limit of the hardware. The shared_ptr everywhere model has a real overhead. For adjacency-list-heavy algorithms on billion-edge graphs, a CSR (Compressed Sparse Row) format with raw arrays will outperform this by an order of magnitude.
You need a graph database or persistent graph storage. This is an in-memory algorithms library.
Your application is real-time and you cannot tolerate the occasional shared_ptr atomic refcount decrement at a critical path.

Getting started in five minutes

git clone https://github.com/ZigRazor/CXXGraph.git
# Add include/ to your include path, then:

#include "CXXGraph/CXXGraph.hpp"
#include <iostream>

int main() {
    CXXGraph::Node<int> a("a", 0), b("b", 1), c("c", 2);

    CXXGraph::T_EdgeSet<int> edges;
    edges.insert(std::make_shared<CXXGraph::DirectedWeightedEdge<int>>(
        "ab", a, b, 4.0));
    edges.insert(std::make_shared<CXXGraph::DirectedWeightedEdge<int>>(
        "bc", b, c, 2.0));
    edges.insert(std::make_shared<CXXGraph::DirectedWeightedEdge<int>>(
        "ac", a, c, 9.0));

    CXXGraph::Graph<int> g(edges);
    auto result = g.dijkstra(a, c);

    if (result.success) {
        std::cout << "Shortest distance: " << result.result << "\n";
        for (auto& node : result.path)
            std::cout << node << " -> ";
    }
}
// Output: Shortest distance: 6  (a -> b -> c)

That is all the ceremony required.

Closing thoughts

CXXGraph is one of those libraries that makes the right call over and over on the small decisions: one algorithm per file, strongly typed result structs, consistent use of std::optional for nullable returns, std::shared_ptr ownership semantics throughout, a parallel layer that degrades gracefully on every platform rather than issuing cryptic link errors. The codebase is readable. You can open any algorithm file and understand it in under ten minutes.

The library is not without rough edges — the dead-code bug in deterministic Dijkstra, the invisible cost of getNodeSet(), the string-based error model. But these are the kinds of issues that a motivated contributor can fix in an afternoon, and the maintainer is actively receptive to pull requests.

If you write C++ and work with graphs — social networks, dependency resolution, route planning, circuit analysis, compiler data-flow — this library deserves a serious look before you reach for Boost.

The full source is at github.com/ZigRazor/CXXGraph. The project is licensed under MPL v2.0.

DEV Community