This will be somewhat out of context if you're coming here first. It's really a footnote in a much longer series of blog posts on summarizing data distributions in computing environments where storage is at a premium.
In that blog post, I wanted to explain how to derive Welford's Algorithm for the recurrence relation for the second central moment, and found the explanations I could find a little lacking (at least for me).
I found this helpful post to be a great starting point, but the algebra part of the post skipped over so many steps that I couldn't follow it.
So I worked it out in greater detail. I'm hoping this is helpful to other mere mortals who, like myself, couldn't quite connect the dots.
Notation:
y represents a new observation.
μ' is the updated mean (the mean after incorporating y).
M 2 M_2 M 2
is the second central moment .
N is the number of observations including y .
First, some work on the mean update formula:
μ ′ = μ + y − μ N \mu^\prime = \mu + \frac{y - \mu}{N} μ ′ = μ + N y − μ
N μ ′ = N μ + y − μ N\mu^\prime = N\mu + y - \mu N μ ′ = N μ + y − μ
( N − 1 ) μ ′ + μ ′ = ( N − 1 ) μ + y (N-1)\mu^\prime + \mu^\prime = (N-1)\mu + y ( N − 1 ) μ ′ + μ ′ = ( N − 1 ) μ + y
( N − 1 ) μ ′ − ( N − 1 ) μ = y − μ ′ (N-1)\mu^\prime - (N-1)\mu = y - \mu^\prime ( N − 1 ) μ ′ − ( N − 1 ) μ = y − μ ′
( N − 1 ) μ − ( N − 1 ) μ ′ = μ ′ − y (N-1)\mu - (N-1)\mu^\prime = \mu^\prime - y ( N − 1 ) μ − ( N − 1 ) μ ′ = μ ′ − y
Now, the main derivation:
M 2 ′ − M 2 = ∑ 1 N ( y i − μ ′ ) 2 − ∑ 1 N − 1 ( y i − μ ) 2 M_2^\prime - M_2 = \sum_1^N{(y_i - \mu^\prime)^2} - \sum_1^{N-1}{(y_i - \mu)^2} M 2 ′ − M 2 = 1 ∑ N ( y i − μ ′ ) 2 − 1 ∑ N − 1 ( y i − μ ) 2
= ( y − μ ′ ) 2 + ∑ 1 N − 1 ( y i − μ ′ ) 2 − ( y i − μ ) 2 = (y - \mu^\prime)^2 + \sum_1^{N-1}{(y_i - \mu^\prime)^2 - (y_i - \mu)^2} = ( y − μ ′ ) 2 + 1 ∑ N − 1 ( y i − μ ′ ) 2 − ( y i − μ ) 2
= ( y − μ ′ ) 2 + ∑ 1 N − 1 ( y i 2 − 2 y i μ ′ + μ ′ 2 ) − ( y i 2 − 2 y i μ + μ 2 ) = (y - \mu^\prime)^2 + \sum_1^{N-1}{(y_i^2 - 2y_i\mu^\prime +\mu^{\prime 2}) - (y_i^2 - 2y_i\mu + \mu^2)} = ( y − μ ′ ) 2 + 1 ∑ N − 1 ( y i 2 − 2 y i μ ′ + μ ′2 ) − ( y i 2 − 2 y i μ + μ 2 )
= ( y − μ ′ ) 2 + ∑ 1 N − 1 − 2 y i μ ′ + 2 y i μ + ( μ ′ 2 − μ 2 ) ) = (y - \mu^\prime)^2 + \sum_1^{N-1}{-2y_i\mu^\prime + 2y_i\mu + (\mu^{\prime 2} - \mu^2))} = ( y − μ ′ ) 2 + 1 ∑ N − 1 − 2 y i μ ′ + 2 y i μ + ( μ ′2 − μ 2 ))
= ( y − μ ′ ) 2 + ∑ 1 N − 1 − 2 y i ( μ ′ − μ ) + ( μ ′ − μ ) ( μ ′ + μ ) = (y - \mu^\prime)^2 + \sum_1^{N-1}{-2y_i(\mu^\prime - \mu) + (\mu^\prime - \mu)(\mu^\prime + \mu)} = ( y − μ ′ ) 2 + 1 ∑ N − 1 − 2 y i ( μ ′ − μ ) + ( μ ′ − μ ) ( μ ′ + μ )
= ( y − μ ′ ) 2 + ∑ 1 N − 1 ( μ ′ − μ ) ( − 2 y i + μ ′ + μ ) = (y - \mu^\prime)^2 + \sum_1^{N-1}{(\mu^\prime - \mu)(-2y_i + \mu^\prime + \mu)} = ( y − μ ′ ) 2 + 1 ∑ N − 1 ( μ ′ − μ ) ( − 2 y i + μ ′ + μ )
= ( y − μ ′ ) 2 + ( μ − μ ′ ) ∑ 1 N − 1 ( 2 y i − μ ′ − μ ) = (y - \mu^\prime)^2 + (\mu - \mu^\prime) \sum_1^{N-1}{(2y_i - \mu^\prime - \mu)} = ( y − μ ′ ) 2 + ( μ − μ ′ ) 1 ∑ N − 1 ( 2 y i − μ ′ − μ )
= ( y − μ ′ ) 2 + ( μ − μ ′ ) [ 2 ( N − 1 ) μ − ( N − 1 ) μ ′ − ( N − 1 ) μ ] = (y - \mu^\prime)^2 + (\mu - \mu^\prime) [ 2(N-1)\mu - (N-1)\mu^\prime - (N-1)\mu ] = ( y − μ ′ ) 2 + ( μ − μ ′ ) [ 2 ( N − 1 ) μ − ( N − 1 ) μ ′ − ( N − 1 ) μ ]
= ( y − μ ′ ) 2 + ( μ − μ ′ ) [ ( N − 1 ) μ − ( N − 1 ) μ ′ ] = (y - \mu^\prime)^2 + (\mu - \mu^\prime) [ (N-1)\mu - (N-1)\mu^\prime ] = ( y − μ ′ ) 2 + ( μ − μ ′ ) [( N − 1 ) μ − ( N − 1 ) μ ′ ]
Substitute the mean update term we worked out earlier.
= ( y − μ ′ ) 2 + ( μ − μ ′ ) ( μ ′ − y ) = (y - \mu^\prime)^2 + (\mu - \mu^\prime) (\mu^\prime - y) = ( y − μ ′ ) 2 + ( μ − μ ′ ) ( μ ′ − y )
= ( y − μ ′ ) 2 + ( μ ′ − μ ) ( y − μ ′ ) = (y - \mu^\prime)^2 + (\mu^\prime - \mu) (y - \mu^\prime) = ( y − μ ′ ) 2 + ( μ ′ − μ ) ( y − μ ′ )
= ( y − μ ′ ) ( y − μ ′ + μ ′ − μ ) = (y - \mu^\prime)(y - \mu^\prime + \mu^\prime - \mu) = ( y − μ ′ ) ( y − μ ′ + μ ′ − μ )
= ( y − μ ′ ) ( y − μ ) = (y - \mu^\prime)(y - \mu) = ( y − μ ′ ) ( y − μ )
This equals the difference
M 2 ′ − M 2 M_2^\prime - M_2 M 2 ′ − M 2
, and so the complete recurrence relation for the second central moments is:
M 2 ′ = M 2 + ( y − μ ′ ) ( y − μ ) M_2^\prime = M_2 + (y - \mu^\prime)(y - \mu) M 2 ′ = M 2 + ( y − μ ′ ) ( y − μ )
And thus the recurrence relation for the corrected sample variance based on the second central moment is:
σ ′ 2 = 1 N − 1 [ M 2 + ( y − μ ′ ) ( y − μ ) ] \sigma^{\prime 2} = \frac{1}{N-1} [ M_2 + (y - \mu^\prime)(y - \mu) ] σ ′2 = N − 1 1 [ M 2 + ( y − μ ′ ) ( y − μ )]
Top comments (0)