DEV Community

Alex Towell
Alex Towell

Posted on • Originally published at metafunctor.com

Accumux: Compositional Online Statistical Reductions in C++

Accumux is a framework for combining statistical accumulators using algebraic composition. The idea is simple: accumulators form a monoid under composition, so you can combine them with +, process data in a single pass, and extract all results.

The Problem

Computing multiple statistics over large datasets usually means multiple passes over the data, hand-rolled code combining different algorithms, or numerical instability from naive implementations. Accumux solves this with compositional accumulators.

Quick Example

#include "accumux/accumulators/kbn_sum.hpp"
#include "accumux/accumulators/welford.hpp"
#include "accumux/core/composition.hpp"

using namespace accumux;

// Compose accumulators with +
auto stats = kbn_sum<double>() + welford_accumulator<double>();

// Single pass through data
std::vector<double> data = {1.0, 2.0, 3.0, 4.0, 5.0};
for (const auto& value : data) {
    stats += value;
}

// Extract all results
auto sum = stats.get_first().eval();           // 15.0
auto mean = stats.get_second().mean();         // 3.0
auto variance = stats.get_second().sample_variance();  // 2.5
Enter fullscreen mode Exit fullscreen mode

Numerically Stable Algorithms

Accumux uses proven algorithms that maintain accuracy even with ill-conditioned data.

Kahan-Babushka-Neumaier Summation

Standard floating-point summation loses precision:

// Naive sum fails on this
std::vector<double> values = {1.0, 1e100, 1.0, -1e100};
// Naive: 0.0 (wrong!)
// KBN:   2.0 (correct!)

auto summer = kbn_sum<double>();
for (auto v : values) summer += v;
std::cout << summer.eval();  // 2.0
Enter fullscreen mode Exit fullscreen mode

Welford's Online Algorithm

Computes mean and variance in a single pass without catastrophic cancellation:

auto welford = welford_accumulator<double>();
for (auto v : data) welford += v;

welford.count();           // Number of samples
welford.mean();            // Running mean
welford.sample_variance(); // Unbiased variance
welford.sample_std_dev();  // Standard deviation
Enter fullscreen mode Exit fullscreen mode

Min/Max Tracking

auto minmax = minmax_accumulator<double>();
for (auto v : data) minmax += v;

minmax.min();  // Minimum value
minmax.max();  // Maximum value
Enter fullscreen mode Exit fullscreen mode

Algebraic Composition

The key insight is that accumulators form a monoid under composition.

// Compose arbitrarily many accumulators
auto financial = kbn_sum<double>() +
                 welford_accumulator<double>() +
                 minmax_accumulator<double>();

std::vector<double> returns = {0.05, -0.02, 0.03, 0.01, -0.01, 0.04};
for (auto ret : returns) {
    financial += ret;  // All three update simultaneously
}

// Extract nested results
auto total = financial.get_first().eval();
auto mean = financial.get_second().mean();
auto volatility = financial.get_second().sample_std_dev();
auto worst = financial.get_second().get_second().min();
auto best = financial.get_second().get_second().max();
Enter fullscreen mode Exit fullscreen mode

Mathematical Foundation

Monoid Structure

Each accumulator type A forms a monoid. The identity is the empty accumulator with no observations. The binary operation merges two accumulators (combining their observations).

auto a = welford_accumulator<double>();
auto b = welford_accumulator<double>();

// Process different data
for (auto v : data1) a += v;
for (auto v : data2) b += v;

// Merge results
auto combined = a + b;  // Equivalent to processing data1 ++ data2
Enter fullscreen mode Exit fullscreen mode

Homomorphism Property

The composition operation preserves structure:

(a + b).process(x) = a.process(x) + b.process(x)
Enter fullscreen mode Exit fullscreen mode

This enables parallel processing: split data, accumulate in parallel, merge results.

Type Safety with C++20 Concepts

Invalid compositions fail at compile time:

// Compile error: can't add incompatible accumulators
auto invalid = kbn_sum<double>() + kbn_sum<int>();  // Type mismatch!

// OK: compatible types compose
auto valid = kbn_sum<double>() + welford_accumulator<double>();
Enter fullscreen mode Exit fullscreen mode

Use Cases

Financial analysis (track returns, volatility, drawdowns in one pass), scientific computing (online statistics for streaming sensor data), machine learning (feature statistics during data preprocessing), and monitoring systems (real-time metrics aggregation).

Performance

O(1) space per accumulator (constant memory regardless of data size). O(n) time for n data points (single pass). Zero allocations during accumulation. Header-only: no linking, no dependencies.

Installation

Header-only, just include:

#include "accumux/accumulators/kbn_sum.hpp"
#include "accumux/accumulators/welford.hpp"
#include "accumux/core/composition.hpp"
Enter fullscreen mode Exit fullscreen mode

Or with CMake:

add_subdirectory(accumux)
target_link_libraries(your_target PRIVATE accumux::accumux)
Enter fullscreen mode Exit fullscreen mode

Resources

Top comments (0)