I Built a Cache Engine from Scratch in Python — and O(1) LFU Eviction Is Sneakier Than LRU

#python #datastructures #backend #programming

When you call cache.get("user:123") and the cache is full, something has to go. That decision — which key gets thrown out to make room — is the whole game in caching. Get it right and your hit rate stays high. Get it wrong and you're constantly evicting the one thing you were about to ask for again.

I'd used functools.lru_cache and Redis for years without really knowing what happens in that moment. So I built CacheLite: an in-memory cache engine in pure Python, no dependencies, where I had to write every eviction policy by hand. LRU, LFU, FIFO, Random — plus TTL expiry, a reader-writer lock, snapshots, an HTTP API and a CLI. The repo is here: https://github.com/hajirufai/cachelite and there's a live walkthrough at https://hajirufai.github.io/cachelite/.

This post is the part I found most interesting: how eviction actually works, and why O(1) LFU is sneakier than O(1) LRU.

The thing nobody tells you about LRU

Everybody knows LRU means "evict the least recently used key." The interview answer is "use an OrderedDict." That's fine, but it hides the actual data structure trick, so I didn't let myself use it.

To do LRU in O(1) you need two things at once:

Instant lookup by key (so a hash map / dict)
Instant reordering when a key is touched (so a doubly-linked list)

The dict maps a key to its node. The linked list keeps everything in recency order, most-recent at the front. When you access a key, you unlink its node and stick it back at the front. When you need to evict, you grab the node at the very back. Both are pointer swaps — no shifting, no scanning.

The detail that makes the code clean is sentinel nodes. Instead of a real head and tail that you have to null-check, you keep two dummy nodes that always exist:

class LRUPolicy(EvictionPolicy):
    def __init__(self) -> None:
        self._head = _Node("\x00head")   # sentinel
        self._tail = _Node("\x00tail")   # sentinel
        self._head.next = self._tail
        self._tail.prev = self._head
        self._nodes: dict[str, _Node] = {}

    def _remove_node(self, node):
        node.prev.next = node.next
        node.next.prev = node.prev
        node.prev = node.next = None

    def _add_to_head(self, node):
        node.prev = self._head
        node.next = self._head.next
        self._head.next.prev = node
        self._head.next = node

Because head and tail always sit on the ends, inserting into an empty list and inserting into a full one are the same four lines. There's no "if this is the first node" branch. The first time I wrote this without sentinels I had three special cases and a bug in each. With sentinels it just works.

Access and eviction fall out of those two helpers:

def on_access(self, key):
    node = self._nodes.get(key)
    if node is None:
        return
    self._remove_node(node)
    self._add_to_head(node)   # touched → move to front

def evict(self):
    victim = self._tail.prev          # the node before tail = oldest
    if victim is self._head:
        raise CacheEmptyError("nothing to evict")
    self._remove_node(victim)
    del self._nodes[victim.key]
    return victim.key

I gave _Node a __slots__ so each node only carries key, prev, next and skips the per-instance __dict__. With a few hundred thousand nodes that adds up.

LFU is where it gets weird

LFU — least frequently used — sounds like a tiny variation on LRU. It is not. The naive version is O(n): to find the least-used key you scan every counter. To do it in O(1) you need three structures pulling together, and this is the part I actually had to draw on paper.

key -> frequency. How many times each key has been hit.
frequency -> ordered set of keys at that frequency. A bucket per frequency level.
min_freq. The lowest frequency currently in use, tracked as you go so eviction never scans.

The buckets are the trick. Every key with frequency 1 lives in bucket 1, every key with frequency 2 in bucket 2, and so on. To evict, you go straight to the bucket at min_freq and drop a key from it.

But there's a tie to break. If five keys all sit at frequency 1, which one goes? You want the one that's been sitting there longest — basically LRU within the frequency level. So each bucket is an OrderedDict, and you pop from the front:

def evict(self):
    if not self._key_freq:
        raise CacheEmptyError("nothing to evict")
    bucket = self._freq_keys[self._min_freq]
    key, _ = bucket.popitem(last=False)   # oldest key at lowest freq
    del self._key_freq[key]
    if not bucket:
        del self._freq_keys[self._min_freq]
        self._min_freq = min(self._freq_keys) if self._freq_keys else 0
    return key

The genuinely fiddly bit is keeping min_freq honest on access. When you touch a key, its frequency goes up by one, so it has to move from bucket f to bucket f+1:

def on_access(self, key):
    freq = self._key_freq.get(key)
    if freq is None:
        return
    bucket = self._freq_keys[freq]
    del bucket[key]
    if not bucket:                       # bucket emptied out
        del self._freq_keys[freq]
        if self._min_freq == freq:       # we just removed the min bucket
            self._min_freq = freq + 1
    new_freq = freq + 1
    self._key_freq[key] = new_freq
    self._freq_keys.setdefault(new_freq, OrderedDict())[key] = None

Here's the insight that makes the min_freq update O(1): when you promote a key out of the minimum bucket and that bucket becomes empty, the new minimum can only be freq + 1. It can't jump to some arbitrary value, because the key you just moved landed there. So you never have to search for the new minimum on the hot path. On a fresh insert, frequency is 1, so min_freq resets to 1 — also free. I only fall back to an actual min() scan in on_remove, which is the rare path.

That asymmetry — insert and access are O(1), only deletion might scan — is the kind of thing you don't appreciate until you've written it wrong once.

FIFO and Random, the honest ones

After LFU these two felt like a holiday.

FIFO evicts in insertion order, ignoring access entirely. A deque would do it, except keys can be deleted before they reach the front, so I tombstone: mark a key dead and skip it when it surfaces during eviction. That keeps inserts and evicts O(1) amortized without rebuilding the queue every time something gets removed.

Random eviction has one interesting wrinkle: how do you delete a random element from a collection in O(1)? You can't index into a dict. The trick is a parallel array of keys plus a key -> index map. To remove a key, you swap it with the last element of the array, pop the end (O(1)), and fix up the moved key's index. random.choice over the array gives you a uniform victim. Same swap-remove pattern game engines use for entity pools.

Expiry: lazy plus active, the way Redis does it

TTL turned out to need two mechanisms, not one, and Redis settled on the same combo for good reasons.

Lazy expiry checks the clock when you read a key. If it's past its expiry, you treat it as a miss and delete it on the spot. Cheap, but a key that's never read again sits in memory forever taking up space.

Active expiry is a background thread that periodically samples keys and clears the dead ones. That reclaims memory from keys nobody is touching. I used time.monotonic() for all of this rather than time.time() — monotonic never jumps backward when the system clock gets adjusted, so a daylight-saving change can't accidentally resurrect or kill your cache entries.

One small Python gotcha bit me here: signal handlers for graceful shutdown only work in the main thread. The background sweeper raised ValueError: signal only works in main thread until I wrapped the registration in a try/except and let non-main contexts skip it.

Making it thread-safe without a single global lock

A threading.Lock around everything would work and would also be slow, because reads — the common case for a cache — would block each other for no reason. Reads don't conflict with reads.

So I wrote a reader-writer lock: any number of readers can hold it at once, but a writer gets exclusive access. Then, to stop one hot key from serializing the whole cache, I added striped locking — hash the key, mod it by the number of stripes, and lock only that stripe. Two writes to different keys usually land on different stripes and never wait on each other. It's the same idea behind Java's old ConcurrentHashMap. The concurrency stress test fires a hundred threads at the cache at once and checks nothing corrupts.

What I'd reach for this in real life

Honestly, for most apps Redis is still the answer. But there's a real gap CacheLite fills: an in-process cache where you want a TTL and a specific eviction policy and stats and the ability to snapshot warm data to disk — without standing up a separate server. functools.lru_cache can't do any of those. I also now actually understand what my cache config is doing when I tune maxmemory-policy on a real Redis box, which is worth something on its own.

The whole thing is about 3,700 lines of Python, zero dependencies, 173 passing tests including the concurrency stress test, with CI on 3.11 and 3.12.

Code: https://github.com/hajirufai/cachelite
Live walkthrough: https://hajirufai.github.io/cachelite/

If you've only ever used a cache as a black box, try writing the LFU eviction path by hand. It's a couple of hours and you'll never look at a maxmemory-policy setting the same way again.