DEV Community

Kyle Pena
Kyle Pena

Posted on • Edited on

Derivation of Welford's Algorithm

This will be somewhat out of context if you're coming here first. It's really a footnote in a much longer series of blog posts on summarizing data distributions in computing environments where storage is at a premium.

In that blog post, I wanted to explain how to derive Welford's Algorithm for the recurrence relation for the second central moment, and found the explanations I could find a little lacking (at least for me).

I found this helpful post to be a great starting point, but the algebra part of the post skipped over so many steps that I couldn't follow it.

So I worked it out in greater detail. I'm hoping this is helpful to other mere mortals who, like myself, couldn't quite connect the dots.

Notation:

  • y represents a new observation.
  • μ' is the updated mean (the mean after incorporating y).
  • M2M_2 is the second central moment - well, not quite. Technically you'd have to divide by N to get the central moment. But we'll call it the central moment here.
  • N is the number of observations including y.

First, some work on the mean update formula:

μ=μ+yμN\mu^\prime = \mu + \frac{y - \mu}{N}
Nμ=Nμ+yμN\mu^\prime = N\mu + y - \mu
(N1)μ+μ=(N1)μ+y(N-1)\mu^\prime + \mu^\prime = (N-1)\mu + y
(N1)μ(N1)μ=yμ(N-1)\mu^\prime - (N-1)\mu = y - \mu^\prime
(N1)μ(N1)μ=μy(N-1)\mu - (N-1)\mu^\prime = \mu^\prime - y

Now, the main derivation:

M2M2=1N(yiμ)21N1(yiμ)2M_2^\prime - M_2 = \sum_1^N{(y_i - \mu^\prime)^2} - \sum_1^{N-1}{(y_i - \mu)^2}
=(yμ)2+1N1(yiμ)2(yiμ)2= (y - \mu^\prime)^2 + \sum_1^{N-1}{(y_i - \mu^\prime)^2 - (y_i - \mu)^2}
=(yμ)2+1N1(yi22yiμ+μ2)(yi22yiμ+μ2)= (y - \mu^\prime)^2 + \sum_1^{N-1}{(y_i^2 - 2y_i\mu^\prime +\mu^{\prime 2}) - (y_i^2 - 2y_i\mu + \mu^2)}
=(yμ)2+1N12yiμ+2yiμ+(μ2μ2))= (y - \mu^\prime)^2 + \sum_1^{N-1}{-2y_i\mu^\prime + 2y_i\mu + (\mu^{\prime 2} - \mu^2))}
=(yμ)2+1N12yi(μμ)+(μμ)(μ+μ)= (y - \mu^\prime)^2 + \sum_1^{N-1}{-2y_i(\mu^\prime - \mu) + (\mu^\prime - \mu)(\mu^\prime + \mu)}
=(yμ)2+1N1(μμ)(2yi+μ+μ)= (y - \mu^\prime)^2 + \sum_1^{N-1}{(\mu^\prime - \mu)(-2y_i + \mu^\prime + \mu)}
=(yμ)2+(μμ)1N1(2yiμμ)= (y - \mu^\prime)^2 + (\mu - \mu^\prime) \sum_1^{N-1}{(2y_i - \mu^\prime - \mu)}
=(yμ)2+(μμ)[2(N1)μ(N1)μ(N1)μ]= (y - \mu^\prime)^2 + (\mu - \mu^\prime) [ 2(N-1)\mu - (N-1)\mu^\prime - (N-1)\mu ]
=(yμ)2+(μμ)[(N1)μ(N1)μ]= (y - \mu^\prime)^2 + (\mu - \mu^\prime) [ (N-1)\mu - (N-1)\mu^\prime ]

Substitute the mean update term we worked out earlier.

=(yμ)2+(μμ)(μy)= (y - \mu^\prime)^2 + (\mu - \mu^\prime) (\mu^\prime - y)
=(yμ)2+(μμ)(yμ)= (y - \mu^\prime)^2 + (\mu^\prime - \mu) (y - \mu^\prime)
=(yμ)(yμ+μμ)= (y - \mu^\prime)(y - \mu^\prime + \mu^\prime - \mu)
=(yμ)(yμ)= (y - \mu^\prime)(y - \mu)

This equals the difference M2M2M_2^\prime - M_2 , and so the complete recurrence relation for the second central moments is:

M2=M2+(yμ)(yμ)M_2^\prime = M_2 + (y - \mu^\prime)(y - \mu)

And thus the recurrence relation for the corrected sample variance based on the second central moment is:

σ2=1N1[M2+(yμ)(yμ)]\sigma^{\prime 2} = \frac{1}{N-1} [ M_2 + (y - \mu^\prime)(y - \mu) ]

Do your career a big favor. Join DEV. (The website you're on right now)

It takes one minute, it's free, and is worth it for your career.

Get started

Community matters

Top comments (0)

AWS Security LIVE!

Tune in for AWS Security LIVE!

Join AWS Security LIVE! for expert insights and actionable tips to protect your organization and keep security teams prepared.

Learn More

👋 Kindness is contagious

Dive into an ocean of knowledge with this thought-provoking post, revered deeply within the supportive DEV Community. Developers of all levels are welcome to join and enhance our collective intelligence.

Saying a simple "thank you" can brighten someone's day. Share your gratitude in the comments below!

On DEV, sharing ideas eases our path and fortifies our community connections. Found this helpful? Sending a quick thanks to the author can be profoundly valued.

Okay