loading...

Cache Me If You Can

krhancoc profile image Kenneth R Hancock ・4 min read

(X-Post from my Medium Blog -- Here)

Hello World! I'd like to take a sentence or two of your time to firstly say that this is my first blog post ever. I hope through this blog I can explore ideas and concepts I enjoy. Hopefully allowing me to grow as a Computer Scientist and programmer, and from my mistakes (and triumphs) will allow others to grow as well. My target audience is anyone that would find this useful. I'd like to briefly touch an introduction to caching before showing the code. But you can find my example project here: https://github.com/krhancoc/Blog-Example-1-Caching.

Cache!

Caching is an optimization technique that stores previously retrieved results into a location you desire (Could be local memory, a file, an SQL database, etc). These results could be anything from an entire page, to outputs from a series of API calls. I like to see caching (and similarly Memoization) as something that reduces Time Complexity at the cost of Space Complexity. Caching is incredibly useful when you have an action that's CPU intensive and the results won't be changing often. I'm going to use an example of returning a fibonacci number recursively (although really I would just use the closed form cause constant time is the bomb). But doing it recursively is great for us because of the following:

  1. The Fibonacci sequence without any use of Caching/Memoization takes an incredibly long time, and is very easy to implement recursively.

  2. The Fibonacci sequence contains many of the same “sub-problems” when done recursively. Ex: F(5) = F(4) + F(3) = F(3) + F(2) + F(2) + F(1). You can see here that F(2) is called twice, and without much thought you can imagine asking for F(800) would result in thousands of the same recursive calls.

Note there is a problem when using the recursive method – the recursion limit of python. Because of how fast this function grows, you'll hit the recursion limit quite quickly, now you can increase this (although not recommended). A proper thing that can be done is “warming” the cache, by asking for increasingly larger fibonacci numbers on server startup. So that way when a patron of your fibonacci site comes along, their request will most likely be already in the cache. There are obviously tonnes of ways to deal with this issue but I wanted to touch a little bit on it.

I was inspired to do this little project when I looked here, and knew I could apply the same technique to create a caching decorator for Django for a function I was using that sent thousands of requests to an API (which in turn put large amounts of stress on the server I was running the application on).

Here is the decorator:

import functools

from django.core.cache import caches
from django.core.cache.backends.base import InvalidCacheBackendError


def cache(cache_alias='default'):
    class cacheobject(object):
        def __init__(self, func):
            self.func = func
            try:
                self.cache = caches[cache_alias]
            except InvalidCacheBackendError:
                self.cache = caches['default']
        def __call__(self, value):
            response = self.cache.get(value)
            if response is None:
                response = self.func(value)
                self.cache.set(value, response, None)
                return response
            else:
                return response

        def __repr__(self):
            '''Return the function's docstring.'''
            return self.func.__doc__

        def __get__(self, obj, objtype):
            '''Support instance methods.'''
            return functools.partial(self.__call__, obj)

    return cacheobject

You'll also need to setup some sort of caching system for your Django project which can be found in the documentation (which is really quite good). For this project it looked like this:

CACHES = {
    'default': {
        'BACKEND': 'django.core.cache.backends.locmem.LocMemCache',
        'LOCATION': 'local',
    },
    'sql': {
        'BACKEND': 'django.core.cache.backends.db.DatabaseCache',
        'location': 'table',

    },
    'file': {
        'BACKEND':   'django.core.cache.backends.filebased.FileBasedCache',
        'LOCATION': 'file_cache'
    }

}
CACHE_MIDDLEWARE_SECONDS = 300
CACHE_MIDDLEWARE_KEY_PREFIX = ''
CACHE_MIDDLEWARE_ALIAS = 'default'

Now from here you can add it to any function you'd like, and change your cache parameter to the alias of the cache you would like to store it, depending on how fast you need cache to be, and also the data you are storing. More permanent data should be stashed in an SQL cache (slower option also potential bandwidth usage) while less permanent data should be stored in system memory, which is very fast), with file caching being in the middle of the two.

For our fibonacci example it would be like:

@cache('default')
def fibb(num):
    if num in (0,1):
        return num
    else:
        return fibb(num -1) + fibb(num - 2)

From here I created a simple Django template to have a user ask for a fibonacci number.

NOW BEHOLD THE POWER OF MEMOIZATION AND CACHING! What was once a function that took minutes to even find the 30th fibonacci number, is now essentially instant (thanks to a nice reduction to linear time complexity). From here, as you warm the cache with larger and larger numbers, you'll be able to find even the thousandth fibonacci number, and rather than taking – 1.071509e+301 – separate function calls, it can be brought down to thousands, or even one function call (depending on what is cached).

F(1000) = 4346655768693745643568852767504062580256466051737178040
24817290895365554179490518904038798400792551692959225930803226347
75209689623239873322471161642996440906533187938298969649928516003
704476137795166849228875

I realize that this is still very general but what I hope for this to be is a template that others may use in their Django projects to reduce the load that certain functions may have on their servers CPU or bandwidth usage. If you get the chance, try pulling and testing the function with and without the caching decorator, and try using different caches for speed comparisons.

Smashing

Cheers folks. Just in case here is the link again to the git repo:

https://github.com/krhancoc/Blog-Example-1-Caching.

If you feel there is more to add don't hesitate to comment or contact me, an open discussion benefits everyone in terms of learning.

Thank you,
Ryan.

Discussion

pic
Editor guide
Collapse
patricktingen profile image
Patrick Tingen

Caching can give tremendous results, but also tremendous headaches. As you may know, Phil Karlton said:

"There are only two hard things in Computer Science: cache invalidation and naming things."

Which is true. I have created a database exploration tool that relies heavily on caching things like user preferences and database scheme details. But there were times when I really considered ripping all caching out just because of all the strange errors. When not implemented right, you can get errors that are hard to reproduce. But when done properly..... your program really flies

Collapse
krhancoc profile image
Kenneth R Hancock Author

Yes! I think its very important to really analyze your calls before thinking of a caching solution. In terms of a functional first approach (every input will always get the same output) I think caching is amazing, when you don't have to deal with side effect issues. Its also good to make sure your cache is getting hit a good percentage of the time. If the cache misses 99% of the time, then a cache is probably not going to help.