Saurabh Chaturvedi

Posted on Jan 10, 2018

Another Way to Find Max Partitions

#algorithms #math #optimization #performance

You’re organizing a hackathon and decide to give free cloud storage as prizes to the winners. For the prize fund, you’ve got 1024 GB of cloud space. You would be giving these gigabytes with the condition that a higher place in the hackathon gets a larger amount of space. Since you want to make as many participants happy as possible, you want to find the maximum number of places for which you’ll be awarding the prizes. That means, if you had just 8 GB available, you’d be having total 3 positions — the winner gets 5 GB, the runner-up gets 2 GB and the person who came third gets 1 GB (another variation is possible — 4, 3 and 1 GBs, but the number of positions is still 3 for 8 GB).

So how do you solve this? Note that (as demonstrated in the above example) there are multiple distributions possible for a given number of positions (let’s call this number p). Indeed, this boils down to representing a number in terms of the sum of distinct smaller numbers in such a way that there are as many of these numbers as possible. For 8 gigs, we could’ve chosen the form 8 = 7 + 1 or 8 = 5 + 3, but these wouldn’t have been optimum since 8 can be expressed as a sum of more than just a couple of numbers — as in 8 = 5 + 2 + 1. A mathematical concept which can add convenience to solving this problem is that of partitions — to quote Wikipedia, a partition of a positive integer n, also called an integer partition, is a way of writing n as a sum of positive integers. So in our case, we simply want to calculate the partition of 512 which has as many numbers as possible. Let’s call this a max partition for the sake of the conversation.

In computer science, this problem falls into a certain class of problems whose solutions use greedy algorithms — procedures that make the locally optimal choice at each stage of the solution with the hope of finding a global optimum.
The “greedy” approach to solving our example is as follows: doesn’t it seem natural to start with 1 as the first summand? All that remains then is to express 7 as a max partition and add 1 to it. But now expressing 7 as a max partition has a constraint — we cannot use 1. So we use 2 and move on to represent 8 - (1+2) = 5 as a max partition. Again, for that, we can’t use 1 and 2. Neither can we use 3 or 4 because then we’ll end up using 2 and 1 again, respectively. Thus, we represent 5 as just itself and we’re done — we now have our max partition as 8 = 1 + 2 + 5. It can be shown easily that this final condition arises when the number we originally wanted to pop out (here 3) is at least half the remaining number (here 5). I leave that part for you to figure out.

So, to put this strategy more formally — consider that we initially have two numbers n= 8 and l = 1. If n ≤ 2l, we simply represent n as itself, otherwise we pop out l and then solve the subproblem of representing n - l as a max partition such that each number in the partition is at least l+1. The value of l for this subproblem is 1 greater than that of the original problem. So for our example (n, l) of representing 8 as a max partition, we first pop out 1 and then solve the subproblem (n-l, l+1), i.e (7, 2). For this subproblem, we pop out 2 and then solve the subproblem (7-2, 2+1), that is (5, 3). Now, since 5 ≤ 2x3, we just pop out 5 and we’re done. We now just sum up the popped out numbers to get 8 = 1 + 2 + 5.
Since we’ve articulated the strategy more formally now, it’s easy to come up with a working program to solve our problem. Here’s a straightforward implementation in Python 3:

def max_partition(n, l=1):
    partition = []
    while n > 2*l:
        partition.append(l)
        n = n - l
        l += 1
    partition.append(n)
    return partition
print(max_partition(int(input())))

But wait — I’ve got another approach. Perhaps better. I noticed that numbers which can be represented as the sum of first n natural numbers are special. Aren’t they already in max-partition form when they show off their identity?! For example, isn’t 6 already in max-partition form when written as 6 = 1 + 2 + 3? Isn’t 10 already in max-partition form when written as 10 = 1 + 2 + 3 + 4? Let’s call such numbers “senior numbers” for the sake of this conversation. This insight forms the basis for my algorithm.

Here’s how we proceed: if the number n whose max-partition we have to find is already a senior number, we simply represent it in it’s n = 1 + 2 + 3 + … + k form. If it’s not a senior number, we still find a k which is just large enough to make the sum s = 1 + 2 + 3 + … + k greater than n (by large enough I mean k can make 1+ 2 + 3 + … + k greater than n, but cannot do the same for 1 + 2 + 3 + … + k-1). Since k is just large enough for s to be greater than n, s-n is going to be less than k. So s-n is going to be a number among 1, 2, 3, …, k-1. What if we “pluck out” s-n from the sum of first k natural numbers? That would give us s-(s-n), which is nothing but n!

By the way, the last character of the above paragraph isn’t for a factorial 🙂. Let’s visualize the idea we’ve learned so far. Usually, Ferrers diagrams are used to visualize partitions, but for our purposes, I found my custom visualization more convenient: in the below tree, the top node is the number whose max partition we wish to evaluate. The leaves are the numbers in the max-partition representation, that, of course, sum up to give the number on the top node. For a senior number, everything’s good:

For a number that’s not senior, we cut the appropriate branch so that the number on the leaf which was connected to the top node via that branch isn’t added. Consider the case of 9:

And below is the tree for 8. Notice that we’re simply finding the number k. For 8 and 9, it’s 4. Since the sum of first 4 natural numbers is 10 we first draw the tree for 10 and then replace 10 in the top node with the number we want — here it’s 8. We then cut of the branch connecting the top node to the number 10 – 8 = 2. For 9, that number was 10 - 9 = 1.

The max partition then is simply the sum of the remaining leaves. I hope you understand my algorithm by now.

One subtlety that’s remaining to be uncovered is the method to find out the “just large enough k”. But it’s a pretty straightforward calculation. The sum of first k natural numbers is n = k * (k+1) / 2. After solving this equation for a positive k, we get k = (√(1 + 8*n) – 1) / 2. Since k is going to be fractional if n wasn’t a senior number, we take the ceiling of it. That makes k large enough.

I think by now I’ve articulated this algorithm clearly. We can thus proceed to programming the solution. Here’s another straightforward implementation in Python 3:

import math
def optimal_summands(n):
    k = ((1 + 8*n)**0.5 - 1) / 2
    k = math.ceil(k)
    summands = list(range(1, k+1))
    the_sum = int(k * (k+1) / 2)
    if the_sum - n > 0:  # If n is not senior.
        del summands[the_sum-n-1]
    return summands
print(optimum_summands(int(input())))

If we do some analysis of both the algorithms, we discover that both of them run in linear time, that is O(n). However, the invisible constant hidden in O(n) is perhaps much less for optimal_summands() than for max_partition().

I did some simple checks in Python to see which method is quicker and the latter one turned out to execute more than thrice as fast as the former. I used Python’s timeit module to time both the algorithms, and here’s an instance of one of my checks on the Python interpreter:

>>> from timeit import timeit
>>>
>>> timeit(setup='from different_summands import optimal_summands', stmt='optimal_summands(10000)', number=100000)
0.8999944160023006
>>>
>>> timeit(setup='from greedy_different_summands import max_partition', stmt='max_partition(10000)', number=100000)
3.161972836998757
>>>

I’ve often observed that knowing some mathematical facts allows one to develop a better algorithm, or at least develop an algorithm faster and with more intuition. Mathematical insights can often dramatically improve the runtime of one’s programs. Math and computer science — especially the study of algorithms, are great friends!

If you know about some other factors which make the latter program work faster, please let me know in the comments. At the end, perhaps greed isn’t always good, but math is. 😀

By the way, you can split those 1024 gigabytes as:
1024 = 1 + 2 + 3 + 4 + 5 + 6 + 7 + 8 + 9 + 10 + 12 + 13 + 14 + 15 + 16 + 17 + 18 + 19 + 20 + 21 + 22 + 23 + 24 + 25 + 26 + 27 + 28 + 29 + 30 + 31 + 32 + 33 + 34 + 35 + 36 + 37 + 38 + 39 + 40 + 41 + 42 + 43 + 44 + 45
(Notice the missing number? Hint: it’s the sum of first 45 natural numbers - 1024.)
Of course, then you’ve got to have more than 45 participants in your hackathon! 🙂
P.S: Should I write an ArXiv paper for this?

Oldest comments (7)

Christophe Riolo • Jan 15 '18 • Edited

Awesome thinking! Thanks for the algorithm!
I'm on my phone so hard to time, so I'll let you try it, but a couple of ideas that might make the code a bit faster :

from math import ceil # no dynamic lookup of ceil

def optimal_summands(n):
    k = ceil(((1 + 8*n)**0.5 - 1) / 2)  # one affectation less
    the_sum = k * (k+1) // 2 # no casting, python will make an int
    diff = the_sum - n - 1 # outside the list comprehension to make sure it's computed only once
    return [i for i in range(1, k+1) if i != diff] # less code, more pythonic. If n is senior, diff is negative so nothing is filtered out.

Saurabh Chaturvedi • Jan 15 '18

Amazing! Thanks for making the code more Pythonic! But, won't the if in the list comprehension be computed k times? In my version, it was computed just once, but then the del made O(k) shifts, so both seem nevertheless sort of balanced. I don't know precisely why, but after I benchmarked your code with mine, my code still ran in a quarter of time taken by your code.

Christophe Riolo • Jan 15 '18 • Edited

Yes indeed, I just got my hands on a computer and I tested everything out, you were faster than me :)
I have a very slight increase in performance with this version :

from math import ceil # no dynamic lookup of ceil

def summands_opt(n):
    k = ceil(((1 + 8*n)**0.5 - 1) / 2)  # one affectation less
    the_sum = k * (k+1) // 2 # no casting, python will make an int
    summands = list(range(1, k+1))
    del summands[the_sum-n-1]
    return summands

The del statement does not fail when there is nothing to delete.

Saurabh Chaturvedi • Jan 15 '18

Awesome! Now this is some production-quality code! Thanks for adding readability to the code as well.

Christophe Riolo • Jan 15 '18

Don't listen to this tired programmer who did too much Ocaml lately: the condition is necessary because of how python interprets negative indices.
I tried other versions, thinking that using more iterators should be faster, but surprisingly no. Here are all the codes and timings I did:

import math
from math import ceil # no dynamic lookup of ceil
from itertools import chain

def summands(n):
    k = ((1 + 8*n)**0.5 - 1) / 2
    k = math.ceil(k)
    summands = list(range(1, k+1))
    the_sum = int(k * (k+1) / 2)
    if the_sum - n > 0:  # If n is not senior.
        del summands[the_sum-n-1]
    return summands

def summands_opt(n):
    k = ceil(((1 + 8*n)**0.5 - 1) / 2)
    the_sum = k * (k+1) // 2
    summands = list(range(1, k+1))
    if the_sum - n > 0:  # If n is not senior.
        del summands[the_sum-n-1]
    return summands

def summands_iter(n):
    k = ceil(((1 + 8*n)**0.5 - 1) / 2)
    the_sum = k * (k+1) // 2
    pluck = max(0, the_sum - n)
    return list(chain(
        range(1, pluck),
        range(pluck+1, k+1)))

def summands_list(n):
    k = ceil(((1 + 8*n)**0.5 - 1) / 2)
    the_sum = k * (k+1) // 2
    pluck = max(0, the_sum - n)
    return list(range(1, pluck)) + list(range(pluck+1, k+1)

>>> from timeit import timeit
>>> timeit(setup='from partition import summands', stmt='summands(10000)', number=100000)
0.17959438299976682
>>> timeit(setup='from partition import summands_opt', stmt='summands_opt(10000)', number=100000)
0.15117240200015658
>>> timeit(setup='from partition import summands_iter', stmt='summands_iter(10000)', number=100000)
0.2742825210002593
>>> timeit(setup='from partition import summands_list', stmt='summands_list(10000)', number=100000)
0.23539557200001582

The last two surprised me a lot: how can the chaining of iterators be slower than concatenation of lists? But unsurprisingly the definition of a custom generator to chain was even slower…

def summands_gen(n):
    k = ceil(((1 + 8*n)**0.5 - 1) / 2)
    the_sum = k * (k+1) // 2
    pluck = max(0, the_sum - n)
    def gen():
        yield from range(1, pluck)
        yield from range(pluck+1, k+1)
    return list(gen())

Christophe Riolo • Jan 15 '18

However, once removing the casting to list, summands_iter would be the only version that do not cause an overflow for very large numbers :

def summands_iter(n):
    k = ceil(((1 + 8*n)**0.5 - 1) / 2)
    the_sum = k * (k+1) // 2
    pluck = max(0, the_sum - n)
    return chain(
        range(1, pluck),
        range(pluck+1, k+1))

>>> summands_opt(2**60)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/christophe/python/partition.py", line 17, in summands_opt
    summands = list(range(1, k+1))
MemoryError
>>> for i in summands_iter(2**60): print(i)
# still printing !

Saurabh Chaturvedi • Jan 16 '18

Whoa! That was some deep analysis!