How We Made a 128-Arm Pattern Match Run 10× Faster
The 128-branch dispatch problem, and the compile-time optimization system that solved it.
Pattern matching is coming to C++ — eventually. C++26 might bring it. But if you're writing C++17 today and need to match on variants, enums, or arbitrary values with guards, destructuring, and combinators, you're on your own.
I've been building Patternia, a C++17 header-only pattern matching library, for the past year. The API is simple enough:
#include <ptn/patternia.hpp>
using namespace ptn;
auto result = match(subject) | on(
val<Status::Ready> >> []{ return "ready"; },
val<Status::Pending> >> []{ return "pending"; },
is<Error>(ds(status)) >> [](int s){ return "error: " + std::to_string(s); },
_ >> []{ return "unknown"; }
);
Clean. But the API isn't the hard part. The hard part is making it fast.
The Problem
When you write a pattern match with 128 cases (not unusual for protocol parsers or command dispatchers), every on() call constructs those 128 case objects — guards, destructuring patterns, combinators — every single time the match is evaluated.
The naive approach: 10.79 nanoseconds per call.
A hand-written switch with 128 branches: 1.09 nanoseconds per call.
That's a 10× gap. If your library is slower than the code it replaces, nobody will use it.
The Solution: Automatic Dispatch Plan Selection
The core insight is that at compile time, the library knows everything about the patterns — which ones are compile-time constants, which ones match variant types, which ones have guards. So why not let the compiler pick the optimal dispatch strategy?
Patternia does this with a six-tier dispatch plan hierarchy, selected at compile time via dispatch_plan_selector:
1. static_literal_dense — O(1) array lookup for val<V> compile-time values
2. literal_runtime_dense — O(1) for runtime lit() values
3. literal_linear — Direct dispatch for integral/enum subjects
4. variant_simple — Precomputed case-index table by variant alt
5. variant_alt_bucketed — Binary search over alt-bucketed cases
6. sequential — Linear fallback
The selector analyzes the pattern set and picks the best strategy at compile time — no runtime overhead for the decision:
template <typename AnalysisResult>
struct dispatch_plan_selector {
static constexpr dispatch_plan_kind kind =
analysis_t::can_use_static_literal_dispatch
? dispatch_plan_kind::static_literal_dense
: (analysis_t::can_use_runtime_literal_dense_dispatch
? dispatch_plan_kind::literal_runtime_dense
: (analysis_t::can_use_simple_literal_dispatch
? dispatch_plan_kind::literal_linear
: (analysis_t::can_use_simple_variant_dispatch
? dispatch_plan_kind::variant_simple
: (analysis_t::can_use_variant_alt_dispatch
? dispatch_plan_kind::variant_alt_bucketed
: dispatch_plan_kind::sequential))));
};
Each tier applies density heuristics to decide whether to engage. For example, static_literal_dense only activates when the value span is at most 4× the number of literal cases — ensuring the lookup table isn't too sparse. Tuning parameters are baked in:
k_static_literal_dense_dispatch_max_cases = 256;
k_static_literal_dense_dispatch_max_span = 512;
k_static_literal_dense_dispatch_max_density = 4;
k_runtime_literal_dense_dispatch_max_cases = 256;
k_runtime_literal_dense_dispatch_max_span = 512;
k_runtime_literal_dense_dispatch_max_density = 8;
The [[no_unique_address]] Bug
Here's where it got weird. The optimization system checks whether a pattern type is "cacheable" — empty types can be skipped, reducing the case construction overhead. It used std::is_empty for this check.
One pattern type, static_literal_pattern, holds a comparator:
template <auto V, typename Cmp = std::equal_to<>>
struct static_literal_pattern : base::pattern_base<static_literal_pattern<V, Cmp>> {
[[no_unique_address]] Cmp cmp{};
// ...
};
std::equal_to<> is technically an empty type. We annotated it with [[no_unique_address]]. But GCC applied the attribute inconsistently — sometimes the member was optimized away, sometimes not. When GCC left it in, std::is_empty returned false for the pattern type, and the cache path was silently skipped.
The fix: replace std::is_empty with a sizeof <= 1 heuristic. Works consistently across GCC, Clang, and MSVC.
Finding this bug required reading the generated assembly for a dozen compiler versions. The lesson: [[no_unique_address]] is a hint, not a guarantee.
Results
With automatic dispatch plan selection, the 128-branch match drops from 10.79 ns to 1.13 ns — within 4% of a hand-written switch (1.09 ns):
| Approach | Latency |
|---|---|
Raw on() (naive) |
10.79 ns |
| PTN_ON macro (manual cache) | 1.16 ns |
| Auto-cache (this work) | 1.13 ns |
| Hand-written switch | 1.09 ns |
The key win: automatic optimization means users get the fast path without knowing dispatch plans exist. They just write match(subject) | on(...) and the compile-time analyzer does the rest.
Real-World Benchmarks
Beyond the 128-branch microbenchmark, Patternia holds its own in realistic scenarios:
| Scenario | Patternia | Baseline | Result |
|---|---|---|---|
| Command parser | 1.790 ns | Switch (2.258 ns) | +26% faster |
| Protocol router | 1.628 ns | IfElse (1.866 ns) | +14% faster |
| Variant mixed | 1.997 ns | std::visit (1.991 ns) | Tied |
How It Compares
Patternia isn't the only C++ pattern matching library. But it brings a few things the others don't:
| Feature | Patternia | matchit.cpp | Mach7 |
|---|---|---|---|
| Pipeline syntax | ✅ | ❌ | ❌ |
| Compile-time dispatch optimization | ✅ | Limited | ❌ |
| Structural bindings | ✅ | ❌ | ❌ |
| Guard expressions | ✅ | ❌ | ❌ |
| Pattern combinators (any/all/neg) | ✅ | ❌ | ❌ |
| C++ standard | C++17 | C++17 | C++14 |
| Header-only | ✅ | ✅ | ✅ |
The automatic dispatch optimization is the differentiator. Other libraries can match — Patternia matches fast.
Try It
The library is a single header include away:
# CMake FetchContent, vcpkg, or just copy the headers
git clone https://github.com/sentomk/patternia
#include <ptn/patternia.hpp>
using namespace ptn;
// Your first pattern match, automatically optimized
auto result = match(value) | on(
val<1> >> []{ return "one"; },
val<2> >> []{ return "two"; },
is<std::string>(ds(s)) >> [](auto& s){ return s; },
_ >> []{ return "other"; }
);
The compile-time optimizer picks the right dispatch strategy. You don't have to think about it.
Patternia on GitHub — 179 stars, 438 commits, C++17 header-only.
Top comments (0)