DEV Community

Bob Lied
Bob Lied

Posted on

PWC 211 #2 Geared toward the average rather than the exceptional

Task 2 of the Perl Weekly Challenge is a problem that benefits from analysis before opening the editor. The statement sounds easy:

You are given an array of integers. Write a script to find
out if the given array can be split into two separate arrays
whose average are the same.

My mind immediately starts thinking about combinations of n things taken k at a time, the efficiency of calculating sums and averages. The factorials involved in that are going to blow up for anything bigger than the small lists given as examples, and there's potential nastiness in comparing floating point values, which these averages are likely to be.

"Mr. Miller, are we ever going to use algebra in real life?"

So let's start with some math. We have a list of length N. Let's call the sum of the numbers in that list S. Now we want to partition the list into two subsets; let's say they have sizes Na and Nb, and the sum of each subset is Sa and Sb. Clearly N = Na + Nb and S = Sa + Sb.

The condition we're aiming for is that both subsets have the same average: Sa/Na = Sb/Nb. Starting with that and substituting for Sb,

Sa/Na = Sb/Nb
Sa/Na = (S - Sa)/Nb
Sa × Nb = (S - Sa) × Na = S×Na - Sa×Na
Sa×Na + Sa×Nb = S×Na
Sa×(Na + Nb) = S × Na
Sa × N = S × Na
Sa/Na = S/N

So, if there is any solution that makes Sa/Na = Sb/Nb true, then at least one of the solutions must be that each of the subsets has the same average as the overall average. We could similarly reach Sb/Nb = S/N if we start by substituting (S-Sb) for Sa.

So what does this buy us? First of all, it gives us an easy way to test for a solution: if any subset has an average that's the same as S/N, we can stop. We don't even have to check the average of the complementary set. If the first subset has sum Sa, count Na, and average S/N, then the other partition must also have average S/N:

Sa/Na = S/N
(S-Sb)/(N-Nb) = S/N
N×(S-Sb) = S×(N-Nb)
S×N - Sb×N = S×N - S×Nb
Sb×N = S×Nb
Sb/Nb = S/N

Furthermore, instead of having to find every possible subset, which would be 2N possibilities, we only have to look at subsets up to size floor(N/2), which is a huge reduction in operation count.

There's another helpful thing that results from this. The average number that we're testing for is likely to be a floating point value. We don't really want to test for equality of floating point numbers; the road to hell is paved with floating point comparisons.

Knowing that we are looking for S/N = Sa/Na, it must be true that Sa = (S×Na)/N; and since Sa must be an integer (since all the terms in the sum are integers), it must be true that (S×Na) % N = 0. We can quickly check for any size Na whether that is even possible by looking at S×Na%N, and we can quickly calculate Sa without iterating over the array by evaluating S×Na/N.

Now we can start thinking about code.


[*] The title comes from the 1972 album "Thick as a Brick", by Jethro Tull. The album came with a fake newspaper attached, containing humorous fake articles that the music referenced. One of the articles was about education reform that would be "geared toward the average rather than the exceptional" so that it would "relieve the crushing burden of individual aspiration.”

Top comments (0)