Hey all! So, yesterday I posted a practice solution for finding the 200,000th number in the Fibonacci sequence as quickly and efficiently as possible. That original post can be found here. I asked if anybody could offer some tips for improvement, and was greeted by a very good, very thorough answer from @rrampage detailing the use of generators rather than an iteration solution like I had. His counter-point was as follows:
Another interesting approach using iteration is making the function into a generator which can be used as part of for loops, list comprehensions, etc
def gen_fib():
    a,b = 1,1
    yield a
    yield b
    while True:
        a,b = b,a+b
        yield b
g = gen_fib()
# Generate the first 200,000 Fibonacci numbers
fibs = [next(g) for _ in range(200000)] 
As Python does not handle a lot of recursion depth very well, this seemed like a good solution in situations where you would need to leverage the CPU more in a low-RAM environment, something Generators are typically quite good at. But, after testing this solution which was simply an example and not meant for performance, then custom-rigging a solution that I thought would be ideally easy-on memory, I got some interesting results.
Here is the code I originally used that leverages storage in variables in the while-loop iteration:
def fibonacci(n):
    iterator = 1
    first_fib_num = 0
    second_fib_num = 1
    while iterator < n:
        added_to = first_fib_num + second_fib_num
        first_fib_num = second_fib_num
        second_fib_num = added_to
        iterator+=1
    return(added_to)
print(fibonacci(200000))
I then ran this script against mprof for memory profiling over time. Below is the plot graph of its usage through its run:
Next, I used the suggested example of the same problem with generators which, again, was not really intended for performance. Its code is above, but I'll put it here as well:
def gen_fib():
    a,b = 1,1
    yield a
    yield b
    while True:
        a,b = b,a+b
        yield b
g = gen_fib()
# Generate the first 200,000 Fibonacci numbers
fibs = [next(g) for _ in range(200000)] 
and below is the corresponding graph of usage over time:
WHAT?!? I was expecting something completely different! At the peak of this chart, right around 14.5 seconds, the generator function is using almost 2GB of memory! 2 GIGABYTES. As in more than 1000x more memory than the iterator function. AND IT TOOK LONGER TO DO IT by almost two seconds.
Please know also, that I am not criticising the solution provided by @rrampage because it was just an example of using a generator to accomplish this task. But a generator is supposed to rely less on memory. Don't believe me, take a look at this sentence from an article on freecodecamp.org:
The main advantage of a generator over a list is that it takes much less memory.
Sure, maybe to store. But apparently not to use, which is kinda the whole point in my eyes really.
The final solution I had was to write my own generator based on a generator I found in the Python documentation and make it as efficient as possible for this purpose. The code is as follows:
def fibon(n):
    a = b = 1
    for i in range(n):
        yield a
        a, b = b, a + b
for x in fibon(200000):
    if x < 3:
        pass
print(x)
And the results of that are represented as the graph below:
So, in conclusion, I used a language-specific feature that is pretty buzzy in Python because it is supposed to accomplish this kind of work better than a solution that is more "off-the-top-of-your-head". Building this generator, thinking through it from a memory and performance standpoint, writing it, and profiling it took me about 30 minutes. My initial solution using a while loop and some variables took me about 5 minutes to come up with. Luckily, the buzzy solution was almost as fast and almost as light on memory, meaning that it would have been almost worth using if it didn't take 6x longer to implement.
So, the moral of the story is that Occam's Razor certainly applies in programming. As developers, we are charged with making things better in a timely and efficient fashion. Our job is to create solutions. Not to think about solutions, but discover, quantify, invent, build, deploy, and support solutions to problems. So much of our field is either doing what has yet to be done, or taking something proved possible in an academic environment, and making it useable by a non-technical majority. This ultimately means that we need to trust our instincts, trust our training, and do the best we can to go with the obvious solution unless that solution proves inadequate in practice.
Thank you so much to the folks that responded to my original post on this topic: @ben, @jess, @rrampage, and everybody else who handles I can't recall right now. Without your discussion and comments, this kind of experiment would never happen.
And for those of you who think this post was about me trashing somebody else's solution, please know that it was not. My own generator solution was not really any better despite spending a ton more time on it than an example submission would get, and testing/re-testing it to cut it down to be only a little bit worse than the 5-minute solution. My original generator solution used over 2GB of RAM. So there is no judgment from me. Generators have a lot of use cases where they shine. But this, despite my initial belief to the contrary, is just not one of them.
 




 
    
Top comments (5)
The generator code takes more memory because we are creating a list out of it in:
fibs = [next(g) for _ in range(200000)]. The memory consumption is because of the list, not the generator itself.Also, the generator example makes a list of all the 200000 fibonacci numbers whereas the iterative one just returns the 200,000th fibonacci. We can test the time for just the 200000th Fibonacci using another generator function called
skipwhich simply skips the first n-1 resultsThe generator one will be slightly slower than the iterative one but it has more versatility e.g without changing the function itself, you can return (say) the odd fibonaccis only or skip every 10 numbers or
zipwith other iterators. Another important point is that they are lazy i.e the results are not calculated till the generator is finally called.On my machine, the iterative one takes 825ms while the generator takes 840ms, not that much of a performance hit.
Thank you for taking the time to profile the code. TIL about
mprof:)Thanks for the clarification. I see the list comp in the original solution that I admittedly kinda scanned the first time 🙈🙈. I’ll run your solution through the test suite later today. I’ll also re-run the tests on my macbook pro which doesn’t have several VMs and a small docker environment running on it like my server does. Admittedly, the times my server got seemed totally insane even for numbers that big. But I decided that most developers here would understand that run times for the same code are relative to the system’s load at run time and wouldn’t really care about the actual figures, but the differences.
Either way, thank you so much for clarifying and for suggesting a generator in the first place. It has caused me to take another look at generators and dig a bit deeper into the Python docs surrounding them. I have a few applications in development right now that I think could benefit from being a bit leaner on memory. Seeing as I obstinately refuse to spin up DO droplets that are more than 5$, making as much use of 1GB RAM as possible is kind of a thing for me 😁. Couple that with my unreasonable love of data-structure comprehensions (they love my back just as much. I don’t care how many times they block my number. DONT TELL ME NOT TO LOVE YOU COMPREHENSIONS.....ahem) and I think generators would help. Plus I don’t access lists by index all that often seeing as I deal a lot with web automation which doesn’t allow me to get or set data in a predictable order.
There is a lot to love about comprehensions. The nice thing is that there are generator comprehensions also which are just a minor syntactic difference from list comprehensions.
List comprehensions are very convenient but as you mentioned in your post, they have a memory cost and many times we do not need all results at once. Generator comprehensions are a nice addition for this purpose.
Awesome! I'm halfway through the comments and have enjoyed the discussion so far!