Caching. Useful stuff. If you're not familiar with it, it's a way to keep data around in memory (or on disk) for fast retrieval. Think of querying a database for some information. Rather than do that every time an application asks for data, we can do it once and keep the result in a cache. Subsequent calls for the data will return the copy from cache instead of making a database query. In theory this improves the performance of your application.
Let's build a simple cache for use in Python programs.
Cache API
I'll start by creating a new module called simplecache
, and defining a class Cache
in it. I'll not implement anything yet, I just want to define the API my cache will use.
class Cache:
""" A simple caching class that works with an in-memory or file based
cache. """
def __init__(self, filename=None):
""" Construct a new in-memory or file based cache."""
pass
def update(self, key, item, ttl=60):
""" Add or update an item in the cache using the supplied key. Optional
ttl specifies how many seconds the item will live in the cache for. """
pass
def get(self, key):
""" Get an item from the cache using the specified key. """
pass
def remove(self, key):
""" Remove an item from the cache using the specified key. """
pass
def purge(self, all=False):
""" Remove expired items from the cache, or all items if flag is set. """
pass
def close(self):
""" Close the underlying connection used by the cache. """
pass
So far so good. We can tell the cache to create a new in-memory or file based cache via the __init__
method. We can add items to the cache using update
- which will overwrite the item if it already exists. We can get an item using a key get
. Finally we can remove items by key remove
, or empty the cache of expired items using purge
(which optionally allows purging all items).
Where to cache the data?
So where is this class going to cache data? Sqlite ships with the Python standard library, and is ideal for this sort of thing. In fact one of the suggested use cases for Sqlite is caching. It allows us to create an in-memory or file based SQL database, which is both our use cases covered. Let's design a SQL table that can hold our cached data.
CREATE TABLE 'Cache' (
'Key' TEXT NOT NULL UNIQUE,
'Item' BLOB NOT NULL,
'CreatedOn' TEXT NOT NULL,
'TimeToLive' TEXT NOT NULL,
PRIMARY KEY("Key"))
);
To break this down, we have a table called Cache
that has four fields. The Key
is a string, and will act as a unique primary key on the table. Next up our Item
field is a blob of binary data - I'm thinking here that we will serialise objects added to the cache before saving them in the database. The last two fields are used to determine the lifetime of the item in the cache - CreatedOn
is the timestamp of when the item was added, and TimeToLive
is the length of time we need to hang on to the item for.
Constructing the cache
Let's start by importing the Sqlite library into our module.
import sqlite3
Then we need to turn our attention to the __init__
method. We have two scenarios to support: one where we are given a filename, and one where we are not.
def __init__(self, filename=None):
""" Construct a new in-memory or file based cache."""
if filename is None:
self._connection = sqlite3.connect(":memory:")
else:
self._connection = sqlite3.connect(filename)
self._create_schema()
We can open a connection and keep it around for the lifetime of the class. Therefore we set the self._connection
property to hold the connection instance, before calling an internal method _create_schema
(more on that in a little while).
Bootstrapping the schema
If we are using an in-memory database, then our schema will not exist yet. However for file based databases that may not be the case. This may be an existing file that has already been setup with our schema. Let's see the code that handles this process.
def _create_schema(self):
table_name = "Cache"
cursor = self._connection.cursor()
result = cursor.execute(
"SELECT name FROM sqlite_master WHERE type = 'table' and name = ?",
(table_name,))
cache_exists = result.fetchone() is not None
if cache_exists:
return
sql = """
CREATE TABLE 'Cache' (
'Key' TEXT NOT NULL UNIQUE,
'Item' BLOB NOT NULL,
'CreatedOn' TEXT NOT NULL,
'TimeToLive' TEXT NOT NULL,
PRIMARY KEY('Key'))
"""
cursor.execute(sql)
cursor.close()
First we define our table name and open a cursor to perform some database operations. Next we check for the existence of our table in the master table. If it's there we exit the method, otherwise we execute a CREATE TABLE...
statement to build our table in the database.
We can now instantiate our Cache
class with either an in-memory or file based database to cache objects in.
Adding to the cache
Let us now turn our attention to adding items to the cache. Recall our Update
method from our API will be responsible for this. If a key already exists in the cache, we are going to replace its entry with whatever is supplied.
def update(self, key, item, ttl=60):
""" Add or update an item in the cache using the supplied key. Optional
ttl specifies how many seconds the item will live in the cache for. """
sql = "SELECT Key FROM 'Cache' WHERE Key = ?"
cursor = self._connection.cursor()
result = cursor.execute(sql, (key,))
row = result.fetchone()
if row is not None:
sql = "DELETE FROM 'Cache' WHERE Key = ?"
cursor.execute(sql, (key,))
connection.commit()
sql = "INSERT INTO 'Cache' values(?, ?, datetime(), ?)"
pickled_item = pickle.dumps(item)
cursor.execute(sql, (key, pickled_item, ttl))
self._connection.commit()
cursor.close()
First we get the in-memory database connection. We then test for the existence of the the supplied key
value in the cache. If it exists, it is removed from the cache. Next we pickle the supplied item
value, turning it into a binary blob. We then insert the key
, picked item
, and ttl
into the cache.
Getting data from the cache
Our behaviour for getting an item from the cache is mostly straight forward. We look for an entry with the specified key
and unpickle the associated item
. The complication in this process arises when the ttl
value has expired. That is, the date of creation plus the ttl
are less than the current time. If this is the case, then the item has expired and should be removed from the cache.
There is a philosophical issue with this method. There is a school of thought that says methods should either return a value (read) or perform a mutation of data (write). Here we are potentially doing both. We are deliberately introducing a side effect (deleting an expired item). I think it is OK in this case, but other programmers might argue otherwise.
def get(self, key):
""" Get an item from the cache using the specified key. """
sql = "SELECT Item, CreatedOn, TimeToLive FROM 'Cache' WHERE Key = ?"
cursor = self._connection.cursor()
result = cursor.execute(sql, (key,))
row = result.fetchone()
if row is None:
return
item = pickle.loads(row[0])
expiry_date = datetime.datetime.fromisoformat(row[1]) + datetime.timedelta(seconds=int(row[2]))
now = datetime.datetime.now()
if expiry_date <= now:
sql = "DELETE FROM 'Cache' WHERE Key = ?"
cursor.execute(sql, (key,))
self._connection.commit()
item = None
cursor.close()
return item
Removing and purging items in the cache
Removing an item is simple - in fact we have already done it (twice) in our other two methods. That's an ideal candidate for refactoring (which we will look at later on). For now, we will implement the method directly.
def remove(self, key):
""" Remove an item from the cache using the specified key. """
sql = "DELETE FROM 'Cache' WHERE Key = ?"
cursor = self._connection.cursor()
cursor.execute(sql, (key,))
self._connection.commit()
cursor.close()
Purging items is a little more complex. We have two scenarios to support - purging all items and purging only expired items. Let's see how we can achieve this.
def purge(self, all=False):
""" Remove expired items from the cache, or all items if flag is set. """
cursor = self._connection.cursor()
if all:
sql = "DELETE FROM 'Cache'"
cursor.execute(sql)
self._connection.commit()
else:
sql = "SELECT Key, CreatedOn, TimeToLive from 'Cache'"
for row in cursor.execute(sql):
expiry_date = datetime.datetime.fromisoformat(row[1]) + datetime.timedelta(seconds=int(row[2]))
now = datetime.datetime.now()
if expiry_date <= now:
sql = "DELETE FROM 'Cache' WHERE Key = ?"
cursor.execute(sql, (key,))
self._connection.commit()
cursor.close()
Deleting all is simple enough. We simply run SQL to delete everything in our Cache
table. For the expired only items we need to loop through each row, compute the expiry date, and determine if it should be deleted. Again, this latter piece of code has been repeated from one of our other methods (get
in this case). Another candidate for refactoring.
Refactoring
We have a working cache implementation that satisfies our original API specification. There is however some repeated code, which we can factor out into their own methods. Lets start with the delete logic that is present in get
, update
, remove
, and purge
methods. These instances can all be replaced with a call to the following new method.
def _remove_item(self, key, cursor):
sql = "DELETE FROM 'Cache' WHERE Key = ?"
cursor.execute(sql, (key,))
cursor.connection.commit()
We can see this has a big impact on our code. Four other methods are now calling the one common _remove_item
method. Next let's take a look at the expiry date checking code.
def _item_has_expired(self, created, ttl):
expiry_date = datetime.datetime.fromisoformat(created) + datetime.timedelta(seconds=int(ttl))
now = datetime.datetime.now()
return expiry_date <= now
def get(self, key):
""" Get an item from the cache using the specified key. """
sql = "SELECT Item, CreatedOn, TimeToLive FROM 'Cache' WHERE Key = ?"
cursor = self._connection.cursor()
result = cursor.execute(sql, (key,))
row = result.fetchone()
if row is None:
return
item = pickle.loads(row[0])
if self._item_has_expired(row[1], row[2]):
self._remove_item(key, cursor)
item = None
cursor.close()
return item
def purge(self, all=False):
""" Remove expired items from the cache, or all items if flag is set. """
cursor = self._connection.cursor()
with self.__class__.lock:
if all:
sql = "DELETE FROM 'Cache'"
cursor.execute(sql)
self._connection.commit()
else:
sql = "SELECT Key, CreatedOn, TimeToLive from 'Cache'"
for row in cursor.execute(sql):
if self._item_has_expired(row[1], row[2]):
self._remove_item(row[0], cursor)
cursor.close()
Great. That's two more places we have reduced code repetition in.
Thread safety with locks
We are almost done. For this to be a robust class, we need to ensure we are thread safe. Caches can often surface as singleton instances in an application, so thread safety is important. We will achieve this by using locks around the destructive cache operations. This is how our whole class looks with locking added. Note the with
blocks around the add and delete operations. These ensure the lock is released even if something goes wrong.
#! /usr/bin/env python3
import datetime
import pickle
import sqlite3
import threading
class Cache:
""" A simple caching class that works with an in-memory or file based
cache. """
_lock = threading.Lock()
def __init__(self, filename=None):
""" Construct a new in-memory or file based cache."""
if filename is None:
self._connection = sqlite3.connect(":memory:")
else:
self._connection = sqlite3.connect(filename)
self._create_schema()
def _create_schema(self):
table_name = "Cache"
cursor = self._connection.cursor()
result = cursor.execute(
"SELECT name FROM sqlite_master WHERE type = 'table' and name = ?",
(table_name,))
cache_exists = result.fetchone() is not None
if cache_exists:
return
sql = """
CREATE TABLE 'Cache' (
'Key' TEXT NOT NULL UNIQUE,
'Item' BLOB NOT NULL,
'CreatedOn' TEXT NOT NULL,
'TimeToLive' TEXT NOT NULL,
PRIMARY KEY('Key'))
"""
cursor.execute(sql)
cursor.close()
def update(self, key, item, ttl=60):
""" Add or update an item in the cache using the supplied key. Optional
ttl specifies how many seconds the item will live in the cache for. """
sql = "SELECT Key FROM 'Cache' WHERE Key = ?"
cursor = self._connection.cursor()
result = cursor.execute(sql, (key,))
row = result.fetchone()
with self.__class__._lock:
if row is not None:
self._remove_item(key, cursor)
sql = "INSERT INTO 'Cache' values(?, ?, datetime(), ?)"
pickled_item = pickle.dumps(item)
cursor.execute(sql, (key, pickled_item, ttl))
self._connection.commit()
cursor.close()
def _remove_item(self, key, cursor):
sql = "DELETE FROM 'Cache' WHERE Key = ?"
cursor.execute(sql, (key,))
cursor.connection.commit()
def get(self, key):
""" Get an item from the cache using the specified key. """
sql = "SELECT Item, CreatedOn, TimeToLive FROM 'Cache' WHERE Key = ?"
cursor = self._connection.cursor()
result = cursor.execute(sql, (key,))
row = result.fetchone()
if row is None:
return
item = pickle.loads(row[0])
if self._item_has_expired(row[1], row[2]):
with self.__class__._lock:
self._remove_item(key, cursor)
item = None
cursor.close()
return item
def _item_has_expired(self, created, ttl):
expiry_date = datetime.datetime.fromisoformat(created) + datetime.timedelta(seconds=int(ttl))
now = datetime.datetime.now()
return expiry_date <= now
def remove(self, key):
""" Remove an item from the cache using the specified key. """
cursor = self._connection.cursor()
with self.__class__._lock:
self._remove_item(key, cursor)
cursor.close()
def purge(self, all=False):
""" Remove expired items from the cache, or all items if flag is set. """
cursor = self._connection.cursor()
with self.__class__._lock:
if all:
sql = "DELETE FROM 'Cache'"
cursor.execute(sql)
self._connection.commit()
else:
sql = "SELECT Key, CreatedOn, TimeToLive from 'Cache'"
for row in cursor.execute(sql):
if self._item_has_expired(row[1], row[2]):
with self.__class__._lock:
self._remove_item(row[0], cursor)
cursor.close()
def close(self):
""" Close the underlying connection used by the cache. """
self._connection.close()
Testing the cache
Time to test our cache. We can do this be spinning up an interactive session as follows.
python -i simplecache.py
Now we can new up an in-memory cache and test our methods.
>>> c = Cache()
>>> c.update("key", "some value")
>>> c.update("key2", [1, 2, 3], 300)
>>> c.get("key")
'some vlaue'
>>> c.remove("key")
>>> c.purge()
>>> c.get("key2")
[1, 2, 3]
>>> c.purge(True)
>>> c.get("key2")
>>> c.close()
>>>
Exercises for the reader
Write a suite of unit tests for the
Cache
class. How easy is it to test? Do you need to make any changes to accommodate testing?Make the time to live a sliding window instead of a fixed time. That is, whenever an item is retrieved from the cache, its time to live value starts over.
Add a method to write the contents of the cache out to the screen.
Top comments (0)