Adding is hard (I mean it)

#performance #ruby #algorithms

In my last post, I analyzed the difference in runtime between an O(N) algorithm and an O(log N) algorithm for finding the nth number in the Fibonacci Sequence. When I plotted the runtime of both algorithms against a range of values of n, I ended up with the following graph:

The red points on this graph resulted from the O(N) algorithm, and the blue points resulted from the O(log N) algorithm. As expected, the O(N) algorithm was significantly slower than the O(log N) algorithm, especially as the value of n got larger.

However, the shape of the O(N) graph was unexpected. As can be seen on the graph, the O(N) curve is not exactly linear and actually appears to be closer to exponential, so I decided to investigate this to try to come up with an explanation for it.

One of the first things I noticed was that the value of the Fibonacci numbers got really big really quickly.

puts simple_fib(10)
# => 55

puts simple_fib(100)
# => 354224848179261915075

puts simple_fib(1000)
# => 43466557686937456435688527675040625802564660517371780402481729089536555417949051890403879840079255169295922593080322634775209689623239873322471161642996440906533187938298969649928516003704476137795166849228875

Since my graph goes up to n = 220,000 on the x-axis, most of the outputs from simple_fib had to have been pretty large.

This leads me into a discussion of how numbers are represented in Ruby. Many personal machines these days are 64-bit. This means that a general-purpose register in the CPU has a maximum of 64 bits, and the largest number that could theoretically be stored in one of these registers is 2⁶⁴ - 1, or 18,446,744,073,709,551,615, since every single bit being a 1 would add up to 2⁰ + 2¹ + 2² + ... + 2⁶³.

In actuality, the largest number that can be stored in a CPU register is often smaller in many language runtimes because they need to use some bits to store metadata, which is the case when representing numbers in Ruby.

Ruby actually has two types of integers. Prior to the release of Ruby 2.4 in 2016, numbers were classified as either FixNum or BigNum, but later versions of Ruby represent both FixNum and BigNum numbers with the Integer class. FixNum numbers are those that can be represented with 64 bits while BigNum numbers require more than 64 bits.

In Ruby, the largest positive integer that can be represented using a single register, and therefore the largest FixNum, is 2⁶² - 1, or 4,611,686,018,427,387,903 (which is much smaller than the 1000th Fibonacci number, as shown above). The 63rd bit is reserved for the sign flag, with 0 indicating a positive number and 1 indicating a negative number. The 0th bit is reserved for the FixNum flag. A FixNum flag of 1 indicates a FixNum while a FixNum flag of 0 indicates a BigNum.

While the way that a FixNum is stored in a register is fairly straightforward (since the integer backing it fits in a single register), the way that a BigNum is stored is slightly more complicated. Ruby stores a BigNum in 32-bit chunks in an array, with the first 32 bits storing the 32 least-significant bits, the second 32 bits storing the 32 second-least-significant bits, and so on. (This article explains this in much greater detail.)

So what does this mean in terms of performance? If you haven't read my last post (or if you need a refresher), this was my simple_fib algorithm:

def simple_fib(n)
    if n == 0
        return 0
    end
    x = 0
    y = 1
    i = 1 
    while i < n
        temp = x + y
        x = y
        y = temp
        i += 1
    end
    return y
end

Essentially, each number is found by adding the previous 2 numbers in the Fibonacci sequence. When n is small and the sum of 2 numbers is a FixNum, which fits within 1 register, Ruby simply has to add the value in the second register to the value in the first register. However, when n is larger and the sum of 2 numbers is larger than the largest FixNum, Ruby has more operations to do.

It might be helpful to think of the addition of FixNums vs. the addition of BigNums analogously to single-digit additions vs. multi-digit additions. When adding single-digit numbers that result in a single-digit sum, the addition is straightforward, but when the sum is multi-digit, the addition process could require a carry and become more complicated:

For the single-digit addition, only one addition operation (4 + 5) had to be done. For the multi-digit addition, the sum of the least significant digits required 1 addition operation (8 + 2), but because that sum resulted in a carry-over, the next summation required 2 addition operations (4 + 5 + 1). In total, the single-digit summation required only 1 addition operation, but the two-digit addition required 3! Similarly, when Ruby sums FixNum integers (and the sum also fits in one 64-bit register), only one add assembly instruction is needed. But when adding BigNum integers, Ruby must keep track of any digits that carry over from one 32-bit chunk to the next, and if there are any carries, extra addition operations must be performed.

Therefore, it's easy to imagine that, as the numbers being added become larger, the number of operations that must be performed to add them and store the sum would also increase. The amount of time that Ruby takes to do these operations would increase for each addition as well. In fact, the function that ends up performing the "addition" operation with BigNums in the Ruby VM is about 45 lines long, as seen here. These extra operations cause the amount of time that it takes to add BigNums to increase exponentially because the number of digits of the BigNums being added increases exponentially with respect to n, which is consistent with the time vs. n graph for simple_fib.

The takeaway from this study is that coming up with efficient algorithms is an important part of writing performant applications, but it is certainly not the only factor. After all, algorithms are mathematical abstractions of problems that need to eventually run on the physical hardware. As an engineer, knowing how the actual algorithm executes on the hardware is just as important as being able to come up with efficient algorithms to solve problems! And while high-level programming languages such as Ruby abstracts these implementation details away from us so that we don't have to think about them all the time, we should not be afraid to dive into the details when we need to.

NOTE This study was done on my new MacBook Pro 2020, which uses an i7 processor (64-bit Intel CPU). On a platform that uses different processors (especially ones with different register sizes, like the ARMv7 processors in Raspberry Pi 2 or 3), things may look different :-) Who knew things could get this complicated for adding numbers?!