Semantic Invalidation That Doesn't Suck

#python #opensource #rust #webdev

If you've worked on a web app for any length of time, you know the deal with caching. You add a cache, everything's fast, and then someone updates something and users see old data. Prices, inventory, whatever. TTL helps but you're always trading freshness for load.

The typical fix is manual invalidation. Update a product, invalidate the cache key. Fine for one endpoint. Less fine when that product has reviews, and reviews have comments, and the product belongs to a store, and the store belongs to an organization. Now you're tracking relationships and invalidating keys everywhere. It gets messy.

I built ZooCache to handle this differently. It's a Python caching library with a Rust core that focuses on semantic invalidation, invalidate based on what changed, not just when.

How It Works

You register dependencies when you cache something:
Note: even if you don't know them upfront.

from zoocache import cacheable, invalidate, add_deps, configure

configure()

from zoocache import cacheable, invalidate, add_deps, configure

configure()

@cacheable()
def get_product(pid):
  product = db.get_product(pid)
  add_deps([f"product:{pid}", f"store:{product.store}", f"org:{product.org}"])
  return product

@cacheable()
def get_reviews(pid):
  product = db.get_product(pid)
  add_deps([
    f"product:{pid}:reviews",
    f"product:{pid}",
    f"store:{product.store}",
    f"org:{product.org}",
  ])
  return db.get_reviews(pid)

@cacheable()
def get_store_products(sid):
  store = db.get_store(sid)
  add_deps([f"store:{sid}", f"org:{store.org}"])
  return db.get_store_products(sid)

@cacheable()
def get_org_stores(oid):
  add_deps([f"org:{oid}"])
  return db.get_org_stores(oid)

These tags form a hierarchy. org:1:stores:2:products:42 is a path in a PrefixTrie.

When you update something, invalidate the relevant tag:

def update_product(pid, data):
    db.update_product(pid, data)
    invalidate(f"product:{pid}")

def update_store(sid, data):
    db.update_store(sid, data)
    invalidate(f"store:{sid}")

def update_org(oid, data):
    db.update_org(oid, data)
    invalidate(f"org:{oid}")

Invalidating org:1 clears everything below it. Product, reviews, store products, all gone. You don't have to remember which functions cached what.

The invalidation itself is O(D) where D is tag depth. Doesn't matter how many items are cached.

Distributed Systems

If you're running multiple instances, ZooCache uses Hybrid Logical Clocks (HLC) for consistency. Each invalidation gets a timestamp that accounts for clock drift. If invalidation B happens after invalidation A, B's timestamp is guaranteed to be higher, even if the clocks are wrong.

There's also passive resync. Every cached entry stores version info. When a node reads data from another node, it checks those versions. If they're newer, it catches up automatically. Reads keep things consistent without extra coordination.

The Thundering Herd Thing

When a cache entry disappears and 100 requests hit your database at once, that's a problem. ZooCache handles this with a SingleFlight pattern. The first request does the work, the other 99 wait, then everyone gets the same result. Database sees one query instead of 100.