Kyle Pena

Posted on Oct 28, 2024 • Edited on Nov 2, 2024

Derivation of Welford's Algorithm

#statistics #algorithms #datascience

This will be somewhat out of context if you're coming here first. It's really a footnote in a much longer series of blog posts on summarizing data distributions in computing environments where storage is at a premium.

In that blog post, I wanted to explain how to derive Welford's Algorithm for the recurrence relation for the second central moment, and found the explanations I could find a little lacking (at least for me).

I found this helpful post to be a great starting point, but the algebra part of the post skipped over so many steps that I couldn't follow it.

So I worked it out in greater detail. I'm hoping this is helpful to other mere mortals who, like myself, couldn't quite connect the dots.

Notation:

y represents a new observation.
μ' is the updated mean (the mean after incorporating y).
$M_2$ is the second central moment - well, not quite. Technically you'd have to divide by N to get the central moment. But we'll call it the central moment here.
N is the number of observations including y.

First, some work on the mean update formula:

\mu^\prime = \mu + \frac{y - \mu}{N}

N\mu^\prime = N\mu + y - \mu

(N-1)\mu^\prime + \mu^\prime = (N-1)\mu + y

(N-1)\mu^\prime - (N-1)\mu = y - \mu^\prime

(N-1)\mu - (N-1)\mu^\prime = \mu^\prime - y

Now, the main derivation:

M_2^\prime - M_2 = \sum_1^N{(y_i - \mu^\prime)^2} - \sum_1^{N-1}{(y_i - \mu)^2}

= (y - \mu^\prime)^2 + \sum_1^{N-1}{(y_i - \mu^\prime)^2 - (y_i - \mu)^2}

= (y - \mu^\prime)^2 + \sum_1^{N-1}{(y_i^2 - 2y_i\mu^\prime +\mu^{\prime 2}) - (y_i^2 - 2y_i\mu + \mu^2)}

= (y - \mu^\prime)^2 + \sum_1^{N-1}{-2y_i\mu^\prime + 2y_i\mu + (\mu^{\prime 2} - \mu^2))}

= (y - \mu^\prime)^2 + \sum_1^{N-1}{-2y_i(\mu^\prime - \mu) + (\mu^\prime - \mu)(\mu^\prime + \mu)}

= (y - \mu^\prime)^2 + \sum_1^{N-1}{(\mu^\prime - \mu)(-2y_i + \mu^\prime + \mu)}

= (y - \mu^\prime)^2 + (\mu - \mu^\prime) \sum_1^{N-1}{(2y_i - \mu^\prime - \mu)}

= (y - \mu^\prime)^2 + (\mu - \mu^\prime) [ 2(N-1)\mu - (N-1)\mu^\prime - (N-1)\mu ]

= (y - \mu^\prime)^2 + (\mu - \mu^\prime) [ (N-1)\mu - (N-1)\mu^\prime ]

Substitute the mean update term we worked out earlier.

= (y - \mu^\prime)^2 + (\mu - \mu^\prime) (\mu^\prime - y)

= (y - \mu^\prime)^2 + (\mu^\prime - \mu) (y - \mu^\prime)

= (y - \mu^\prime)(y - \mu^\prime + \mu^\prime - \mu)

= (y - \mu^\prime)(y - \mu)

This equals the difference $M_2^\prime - M_2$ , and so the complete recurrence relation for the second central moments is:

M_2^\prime = M_2 + (y - \mu^\prime)(y - \mu)

And thus the recurrence relation for the corrected sample variance based on the second central moment is:

\sigma^{\prime 2} = \frac{1}{N-1} [ M_2 + (y - \mu^\prime)(y - \mu) ]

DEV Community

Derivation of Welford's Algorithm

Top comments (0)

Read next

BitNet a4.8: 4-bit Activations Push 1-bit LLMs to State-of-the-Art Performance

Intro to SQL using Apache Iceberg and Dremio

1829. Maximum XOR for Each Query

Building Scalable data pipelines ;Best practices for Modern Data Engineers