Akarshan Gandotra

Posted on Jan 3 • Edited on Jan 6

LRU vs TinyLFU: Choosing the Right Cache Eviction Strategy 🚀

#lru #tinylfu #cache

When building high-performance applications, choosing the right cache eviction policy can make or break your system's performance. While both TinyLRU and TinyLFU aim to keep your hot data cached, they take fundamentally different approaches. Understanding when to use each can save you from performance disasters and wasted CPU cycles.

The Fundamentals 📚

What is LRU?

LRU (Least Recently Used) is one of the most intuitive cache eviction policies. The principle is simple: when the cache is full and you need to make room for new data, evict the item that was accessed longest ago.

Think of it like organizing books on a small shelf. Every time you read a book, you put it at the front. When the shelf fills up, you remove the book from the back that you haven't touched in the longest time.

LRU Cache Operations:
┌─────────────────────────┐
│ Get(key)                │
│  └─ Move to front       │
│                         │
│ Set(key, value)         │
│  └─ Add to front        │
│  └─ Evict oldest if full│
└─────────────────────────┘

What is TinyLFU?

TinyLFU (Least Frequently Used with Frequency sketch) takes a different approach. Instead of just tracking when items were accessed, it tracks how often they've been accessed. Before admitting a new item, TinyLFU asks: "Is this item more valuable than what I'd have to evict?"

TinyLFU Cache Operations:
┌─────────────────────────────┐
│ Get(key)                    │
│  └─ Increment frequency     │
│  └─ Return if cached        │
│                             │
│ Set(key, value)             │
│  └─ Compare frequencies     │
│  └─ Admit if worthy         │
│  └─ Reject if not valuable  │
└─────────────────────────────┘

The Critical Difference: Recency vs Frequency 🎯

The fundamental distinction between these algorithms is what they optimize for:

LRU optimizes for:  RECENCY (when was it last used?)
TinyLFU optimizes for: FREQUENCY (how often is it used?)

This difference seems subtle but has massive implications for real-world systems.

Visual Comparison 📊

LRU Behavior

Timeline of Cache Operations:

Initial: [A] ← → [B] ← → [C]
         ↑               ↑
        MRU             LRU

Access B: [B] ← → [A] ← → [C]

Add D (full): [D] ← → [B] ← → [A]
              ↑               ↑
             MRU    Evicted → C

Result: C removed (oldest), regardless of access count

TinyLFU Behavior

Cache State:
Item A: frequency = 500
Item B: frequency = 300  
Item C: frequency = 100 ← LFU victim

New Item D arrives (frequency = 5)
    ↓
Compare: D(5) vs C(100)
    ↓
Decision: REJECT ❌
    ↓
Result: D never enters cache
        C stays (more valuable)

New Item E arrives (frequency = 150)
    ↓
Compare: E(150) vs C(100)
    ↓
Decision: ADMIT ✅
    ↓
Result: E enters cache
        C evicted

The Real-World Problem: JWT Authentication 🔐

Let's examine a production scenario that perfectly illustrates why the choice matters. This example is drawn from real JWT authentication systems handling 100,000 requests per second.

The Setup

You're running an auth gateway. Every incoming request carries a JWT. Verifying that JWT costs 200 microseconds. At 100,000 requests per second, that's 20 full CPU cores doing nothing but crypto.

So you add caching. Problem solved... until it isn't.

Your Traffic Pattern

🔥 Hot Tokens (70% of traffic)
   └─ Long-lived sessions
   └─ Service accounts: 20 req/sec each
   └─ Browser sessions: 50 req/min each
   └─ Thousands of hits per 20-min lifetime

❄️  Cold Tokens (25% of traffic)
   └─ One-off API calls
   └─ Batch jobs with unique tokens
   └─ Each token: 1-2 hits, never again

🎲 Noise Tokens (5% of traffic)
   └─ Scanners
   └─ Malformed requests
   └─ Expired tokens from broken clients
   └─ Pure garbage

The LRU Failure Mode

Time: 3:00 AM
Event: Batch job spawns 1000 workers
Each worker: unique JWT, single request

LRU Decision Tree:
┌─────────────────────────────┐
│ New token arrives           │
│ Cache is full               │
└──────────┬──────────────────┘
           │
           ▼
    ┌──────────────┐
    │ Admit token  │  ← Always admits
    │ Evict oldest │  ← Evicts hot tokens
    └──────────────┘
           │
           ▼
┌─────────────────────────────┐
│ Cache now full of 1000      │
│ tokens that will NEVER      │
│ be seen again               │
└─────────────────────────────┘

Impact:
├─ Service account tokens evicted
├─ Active browser sessions evicted
├─ Hit rate: 92% → 61%
└─ CPU usage doubles

As the author of the TinyLFU JWT authentication analysis notes, this scenario transforms a stable caching system into an unpredictable performance disaster. Hot tokens that should remain cached for their entire 20-minute lifetime get evicted by cold tokens that will never be seen again.

The TinyLFU Solution

Same scenario: 1000 batch job tokens arrive

TinyLFU Decision for Each Token:
┌─────────────────────────────┐
│ New batch token             │
│ Frequency: 1                │
└──────────┬──────────────────┘
           │
           ▼
    ┌──────────────┐
    │ Compare with │
    │ LFU victim   │
    │ (freq: 500)  │
    └──────┬───────┘
           │
           ▼
    ┌──────────────┐
    │ 1 < 500      │
    │ REJECT ❌    │
    └──────────────┘
           │
           ▼
┌─────────────────────────────┐
│ Token never enters cache    │
│ Hot tokens remain cached    │
│ Hit rate stays: 92%         │
└─────────────────────────────┘

TinyLFU's admission gate prevents cache pollution. Batch tokens never achieve sufficient frequency to compete with legitimate hot tokens.

Data Structure Comparison 🏗️

LRU Implementation

Components:
┌─────────────────────────────────┐
│ HashMap: key → node             │
│ (O(1) lookups)                  │
└─────────────────────────────────┘
           │
           ▼
┌─────────────────────────────────┐
│ Doubly Linked List              │
│ (maintains access order)        │
│                                 │
│ [Head] ← → [Node] ← → [Tail]    │
│  (MRU)                   (LRU)  │
└─────────────────────────────────┘

Operations:
Get:  HashMap lookup + move to head  = O(1)
Set:  HashMap insert + add to head   = O(1)
      (evict tail if full)

Memory: ~100 bytes/entry overhead

TinyLFU Implementation

Components:
┌─────────────────────────────────┐
│ Main Cache (LRU)                │
│ 99% of capacity                 │
└─────────────────────────────────┘
           │
           ▼
┌─────────────────────────────────┐
│ Admission Window (tiny LRU)     │
│ 1% of capacity                  │
│ (for cold-start protection)     │
└─────────────────────────────────┘
           │
           ▼
┌─────────────────────────────────┐
│ Count-Min Sketch                │
│ (frequency tracking)            │
│ 10x cache size                  │
│ ~4 bits per counter             │
└─────────────────────────────────┘

Operations:
Get:  Increment sketch + cache lookup    = O(1)
Set:  Compare frequencies + conditional admit = O(1)

Memory: ~500 KB for 100k cache with 1M counters

Count-Min Sketch: The Magic Behind TinyLFU 🎩

The Count-Min Sketch is the secret sauce that makes TinyLFU practical. It's a probabilistic data structure that tracks frequencies for millions of items using only a tiny amount of memory.

The Problem It Solves

Imagine you need to track access frequencies for every JWT token you've ever seen:

Naive Approach:
HashMap: token → counter

Problems:
├─ Need to store every unique token
├─ 1 million tokens × 64 bytes = 64 MB just for keys
├─ Plus counter storage
└─ Total: ~100 MB+ for frequency tracking alone ❌

Count-Min Sketch solves this with a clever tradeoff: instead of exact counts, it provides approximate counts with strong guarantees.

How Count-Min Sketch Works

Think of it as a 2D grid of counters with multiple hash functions:

Count-Min Sketch Structure:

Hash1 →  [ 3][ 1][ 5][ 2][ 8][ 0][ 4]...
Hash2 →  [ 2][ 4][ 1][ 7][ 3][ 1][ 2]...
Hash3 →  [ 1][ 2][ 6][ 3][ 1][ 4][ 5]...
Hash4 →  [ 4][ 1][ 3][ 5][ 2][ 1][ 8]...
         ↑
    Counters (4 bits each)

Dimensions:
├─ Width (w): Number of counters per row
├─ Depth (d): Number of hash functions
└─ Total memory: w × d × 4 bits

Operations Visualized

Incrementing a Token:

Token: "jwt_user_123"
    ↓
Apply 4 hash functions:
├─ Hash1(token) = 42  → Increment row1[42]
├─ Hash2(token) = 157 → Increment row2[157]
├─ Hash3(token) = 891 → Increment row3[891]
└─ Hash4(token) = 23  → Increment row4[23]

Before:
Row1: ...[ 3][ 5]...  (position 42)
Row2: ...[10][ 2]...  (position 157)
Row3: ...[ 7][ 1]...  (position 891)
Row4: ...[ 2][ 8]...  (position 23)

After:
Row1: ...[ 4][ 5]...  ← Incremented
Row2: ...[11][ 2]...  ← Incremented
Row3: ...[ 8][ 1]...  ← Incremented
Row4: ...[ 3][ 8]...  ← Incremented

Time: O(1) - Just 4 increments

Querying Frequency:

Token: "jwt_user_123"
    ↓
Apply same 4 hash functions:
├─ Hash1(token) = 42  → Read row1[42] = 4
├─ Hash2(token) = 157 → Read row2[157] = 11
├─ Hash3(token) = 891 → Read row3[891] = 8
└─ Hash4(token) = 23  → Read row4[23] = 3

Estimated frequency: min(4, 11, 8, 3) = 3 ✅

Why minimum?
└─ Counters can be inflated by collisions
    Take the minimum = most conservative estimate

Time: O(1) - Just 4 lookups + min operation

Why It Works: Hash Collisions

Different tokens can hash to the same positions (collisions), but the Count-Min Sketch handles this elegantly:

Collision Example:

Token A and Token B both hash to position 42 in Row1

Row1[42] stores: countA + countB = 15
    ↓
When querying Token A:
├─ Row1: 15 (includes Token B) ← Overestimate
├─ Row2: 12 (includes Token C) ← Overestimate
├─ Row3: 9  (no collision)     ← True count
└─ Row4: 11 (includes Token D) ← Overestimate

Result: min(15, 12, 9, 11) = 9

Key Property:
├─ Estimates can be HIGHER than true count
├─ Estimates are NEVER LOWER than true count
└─ Using multiple rows reduces overestimation

The Guarantee

Count-Min Sketch provides mathematical guarantees:

True frequency: f
Estimated frequency: f'

Guarantee: f ≤ f' ≤ f + ε × N

Where:
├─ ε (epsilon): Error parameter (e.g., 0.01 = 1%)
├─ N: Total number of items seen
└─ f' never underestimates

Example with ε = 0.01:
├─ True count: 100
├─ Total items: 1,000,000
├─ Max error: 0.01 × 1,000,000 = 10,000
└─ Estimated: 100 to 10,100

For TinyLFU caching decisions, this is perfect because we're making relative comparisons:

Admission Decision:
Token A frequency: ~500 (might be 500-600)
Token B frequency: ~50  (might be 50-100)

Decision: A > B is correct regardless of small errors ✅

Memory Efficiency

Compare memory usage for tracking 1 million tokens:

Exact HashMap:
├─ 1M entries × 64 bytes (key)
├─ 1M entries × 8 bytes (counter)
└─ Total: ~72 MB

Count-Min Sketch (w=10M, d=4):
├─ 10M counters × 4 rows
├─ 40M counters × 4 bits each
├─ 160M bits = 20 MB
└─ 72% memory savings ✅

Even Better - 4-bit Counters:
├─ Each counter: 0 to 15
├─ When counter reaches 15, halve all counters
├─ Maintains relative frequencies
└─ Prevents overflow

Counter Decay in Action

TinyLFU uses periodic decay to keep frequencies fresh:

Before Decay:
Row1: [15][12][ 8][15]...
Row2: [13][15][ 6][ 9]...
Row3: [15][ 7][11][15]...
Row4: [10][15][15][ 4]...

Decay Event (right shift by 1 bit = divide by 2):
Row1: [ 7][ 6][ 4][ 7]...
Row2: [ 6][ 7][ 3][ 4]...
Row3: [ 7][ 3][ 5][ 7]...
Row4: [ 5][ 7][ 7][ 2]...

Benefits:
├─ Recent activity matters more
├─ Old patterns fade away
├─ Prevents counter overflow
└─ Extremely fast (bit shift operation)

Real-World Example: JWT Cache

Scenario: 100k req/sec, tracking 5M unique tokens

Configuration:
├─ Width: 10M counters
├─ Depth: 4 hash functions
├─ Counter size: 4 bits
└─ Total memory: 20 MB

Token Types:
┌─────────────────────────────────────┐
│ Service Account (Token A)          │
│ True frequency: 2000                │
│ Sketch estimate: 2003               │
│ Error: 0.15%                        │
└─────────────────────────────────────┘

┌─────────────────────────────────────┐
│ Batch Job Token (Token B)          │
│ True frequency: 1                   │
│ Sketch estimate: 1                  │
│ Error: 0%                           │
└─────────────────────────────────────┘

┌─────────────────────────────────────┐
│ Random Attack Token (Token C)      │
│ True frequency: 1                   │
│ Sketch estimate: 2 (collision)      │
│ Error: 100% (but still tiny!)       │
└─────────────────────────────────────┘

Admission Decision:
├─ Token A (2003) vs Token B (1) → A wins ✅
├─ Token A (2003) vs Token C (2) → A wins ✅
└─ Small errors don't affect decisions

Why This Matters for TinyLFU

The Count-Min Sketch enables TinyLFU to:

Track millions of items with minimal memory
Make instant decisions with O(1) operations
Handle high throughput without becoming a bottleneck
Adapt to changing patterns through decay
Tolerate approximation because relative comparisons still work

Without Count-Min Sketch, TinyLFU would need exact frequency counters for every item ever seen, making it impractical for high-scale systems. The sketch transforms TinyLFU from a theoretical idea into a production-ready cache algorithm.

The Trade-off:
├─ Give up: Exact counts
├─ Gain: 70-90% memory savings
├─ Guarantee: Never underestimate
└─ Result: Practical frequency-based caching ✅

Attack Resilience 🛡️

LRU Under Attack

Attacker Strategy: 50k random JWTs/sec

LRU Response:
┌─────────────────────────────────┐
│ Random Token 1 → ADMIT          │
│ Random Token 2 → ADMIT          │
│ Random Token 3 → ADMIT          │
│ ... (evicting legitimate tokens)│
│ Random Token 50000 → ADMIT      │
└─────────────────────────────────┘

Result:
├─ Cache fills with garbage
├─ Legitimate tokens evicted
├─ Hit rate collapses
└─ CPU usage spikes

TinyLFU Under Attack

Same Attack: 50k random JWTs/sec

TinyLFU Response:
┌─────────────────────────────────┐
│ Random Token 1 (freq=1) → REJECT│
│ Random Token 2 (freq=1) → REJECT│
│ Random Token 3 (freq=1) → REJECT│
│ ... (legitimate tokens safe)    │
│ Random Token 50000 → REJECT     │
└─────────────────────────────────┘

Meanwhile:
├─ Legitimate Token A (freq=2000) → stays cached
├─ Service Account B (freq=1500) → stays cached
└─ Hit rate remains: 92%

As demonstrated in the JWT authentication case study, the attack actually makes TinyLFU's cache more resilient because random tokens can never achieve the frequency needed to displace legitimate hot tokens.

Configuration Deep Dive ⚙️

LRU Configuration

Simple Parameters:

cache := NewLRU(100000)  // Just capacity

Optional:
├─ TTL per entry
├─ Expiration callback
└─ Memory limit (bytes)

Rule of Thumb:
Size = ActiveUsers × AverageSessionRequests × SafetyMargin
     = 10,000 × 50 × 2
     = 1,000,000 entries

TinyLFU Configuration

Critical Parameters:

cache := NewTinyLFU(TinyLFUConfig{
    MaxSize:      100000,    // Main cache capacity
    NumCounters:  1000000,   // Frequency sketch size (10x)
    BufferItems:  1000,      // Admission window (1%)
})

Parameter Breakdown:

NumCounters:
├─ Rule: 10x cache size
├─ Why: Reduces hash collisions
├─ Memory: counters × 4 bits
└─ Example: 1M counters = 500 KB

BufferItems:
├─ Rule: 1% of MaxSize
├─ Purpose: Cold-start protection
├─ Behavior: New items land here first
└─ Graduation: Build frequency → main cache

MaxCost (optional):
├─ For variable-sized entries
├─ Memory-bounded eviction
└─ Example: 50 MB budget for mixed sizes

Conclusion 🎯

Both TinyLRU and TinyLFU have their place in modern systems:

TinyLRU is your default choice. It's simple, fast, and works well for most caching scenarios. Use it when recency correlates with value in your workload.

TinyLFU is your specialist tool. It shines when you have frequency-skewed workloads, face burst traffic, or need resilience against cache pollution. The JWT authentication scenario demonstrates how critical this becomes at scale where poor cache decisions directly translate to wasted CPU cores and degraded latency.

The right choice depends on understanding your access patterns. Monitor your cache behavior, measure hit rates under realistic load, and choose the algorithm that matches your workload characteristics.

Further Reading:

📄 TinyLFU: A Highly Efficient Cache Admission Policy
🔐 TinyLFU in JWT Authentication Systems
🔧 Ristretto: A fast, concurrent TinyLFU cache
📚 golang-lru: Simple LRU implementation
📊 Count-Min Sketch: The probabilistic data structure powering TinyLFU

Choose wisely, cache efficiently, and may your hit rates stay high! 🚀

DEV Community