Summing 50,000 emission line items in the wrong order changes your total

#javascript #webdev #programming #carbon

Floating-point addition isn't associative. For a corporate inventory with tens of thousands of rows, naive summation drifts — and the number you disclose depends on row order. Here's why, and the fix.

Here's a result that should bother anyone building carbon software. Take a corporate emissions inventory — tens of thousands of line items, each a number in tonnes CO₂e. Sum it. Now sort the same rows differently and sum again. The totals don't match. Not by much — maybe the third or fourth decimal place — but they don't match, and nothing in your code changed except the order.

If you've never seen this, open a console:

0.1 + 0.2 === 0.3   // false

That's the same bug, scaled up to a reporting deliverable.

Why order changes the answer

IEEE 754 doubles have 52 bits of mantissa. That's about 15–16 significant decimal digits of precision — generous, until you add numbers of very different magnitudes.

When you add a small number to a large running total, the small number gets shifted right to line up the exponents before the addition happens. Bits that fall off the end of the mantissa are gone. Add a 0.0001 tCO₂e line to a running total of 80000.0 and there simply aren't enough mantissa bits to hold both the 80,000 and the 0.0001 — the small value is partially or completely swallowed.

Float addition, as a result, isn't associative. (a + b) + c is not guaranteed to equal a + (b + c). Sum your rows largest-first and the small values vanish early against a big accumulator. Sum smallest-first and they accumulate into something large enough to survive. Same data, different total.

Here's the effect, deliberately constructed to be visible:

const big = 80000;
const smalls = Array(50000).fill(0.0001);

// small values first, then the big one
let a = 0;
for (const x of [...smalls, big]) a += x;

// big value first, then the smalls
let b = 0;
for (const x of [big, ...smalls]) b += x;

console.log(a); // 80004.99999999...
console.log(b); // 80004.99999999... but not the same trailing digits
console.log(a === b); // false

The 50,000 small lines should contribute exactly 5.0. Depending on order, some of that contribution erodes against the large accumulator.

Why this matters for an inventory specifically

In most software a fourth-decimal drift is noise. In greenhouse gas accounting it's a problem for one specific reason: your numbers get re-run by someone else.

An assurance provider performing limited or reasonable assurance will rebuild your total from the underlying activity data. If their sum and your disclosed sum disagree — even slightly, even for a reason as defensible as floating-point order — that's a discrepancy someone has to explain. Reproducibility isn't a nice-to-have in this context; non-reproducibility is an audit finding.

And inventories are exactly the shape that triggers the problem: a handful of very large line items (grid electricity for a head office, a corporate vehicle fleet) sitting in the same sum as thousands of tiny ones (individual business trips, small fugitive losses). Large magnitude spread, high row count — the two conditions that make naive summation drift.

The fix: compensated summation

Kahan summation (also called compensated summation) keeps a second variable that tracks the low-order bits naive addition throws away, and feeds them back in on the next iteration.

function kahanSum(values) {
  let sum = 0;
  let c = 0; // running compensation for lost low-order bits
  for (const value of values) {
    const y = value - c;      // bring back what we lost last time
    const t = sum + y;        // the lossy addition
    c = (t - sum) - y;        // recover exactly what got dropped
    sum = t;
  }
  return sum;
}

The trick is the third line. (t - sum) recovers the part of y that actually made it into the sum; subtracting y leaves the part that didn't — the error term. You carry that error into the next iteration and add it back. The compensation c is, in effect, the bits that fell off the mantissa, saved for next time.

Run the earlier example through kahanSum and both orderings return the same total, accurate to the precision you'd expect from the inputs.

It's not magic — it can't manufacture precision the inputs never had — but it removes the order-dependence and recovers the low-order bits that naive summation discards. For an inventory total that someone else is going to re-derive, that order-independence is the property you actually want.

When this is overkill

Most of the time you don't need this. The drift only becomes material when row count and magnitude spread are both high:

A few dozen rows of similar magnitude: don't bother. The error is far below any rounding you'd disclose at.
Thousands of rows, similar magnitude: borderline. Usually fine, but cheap to be safe.
Thousands of rows with a wide magnitude spread (large facility totals summed alongside tiny per-trip lines): use compensated summation. This is the inventory case.

A rough mental threshold: if your largest line item is several orders of magnitude bigger than your smallest, and you have enough small lines that their collective contribution matters, naive summation can lose a material slice of the small-line total. That's most real corporate inventories.

If you want to avoid the question entirely, you can sum in fixed-point — work in integer grams or integer kg CO₂e instead of fractional tonnes — and convert only at display time. That sidesteps float drift at the cost of carrying integers around. Compensated summation is the less invasive change if you're already float-based.

The uncomfortable part

Most carbon calculators never hit this, and that's exactly the problem. They're tested against toy datasets — a dozen rows, a worked example from a guidance document — where naive summation is indistinguishable from compensated. The bug doesn't exist at demo scale. It surfaces at real inventory scale, in production, in the one number a company actually discloses, and it surfaces as an unexplained discrepancy in someone else's spreadsheet.

If you're building anything that sums emissions data for disclosure, this is worth eight lines of defensive code. I build carbon calculators where the methodology and provenance are the whole point, and order-independent summation is one of those things that costs almost nothing to get right and is genuinely awkward to explain after the fact.

Sort your rows however you like. The total shouldn't care.