When building high-performance applications, choosing the right cache eviction policy can make or break your system's performance. While both TinyLRU and TinyLFU aim to keep your hot data cached, they take fundamentally different approaches. Understanding when to use each can save you from performance disasters and wasted CPU cycles.
The Fundamentals π
What is LRU?
LRU (Least Recently Used) is one of the most intuitive cache eviction policies. The principle is simple: when the cache is full and you need to make room for new data, evict the item that was accessed longest ago.
Think of it like organizing books on a small shelf. Every time you read a book, you put it at the front. When the shelf fills up, you remove the book from the back that you haven't touched in the longest time.
LRU Cache Operations:
βββββββββββββββββββββββββββ
β Get(key) β
β ββ Move to front β
β β
β Set(key, value) β
β ββ Add to front β
β ββ Evict oldest if fullβ
βββββββββββββββββββββββββββ
What is TinyLFU?
TinyLFU (Least Frequently Used with Frequency sketch) takes a different approach. Instead of just tracking when items were accessed, it tracks how often they've been accessed. Before admitting a new item, TinyLFU asks: "Is this item more valuable than what I'd have to evict?"
TinyLFU Cache Operations:
βββββββββββββββββββββββββββββββ
β Get(key) β
β ββ Increment frequency β
β ββ Return if cached β
β β
β Set(key, value) β
β ββ Compare frequencies β
β ββ Admit if worthy β
β ββ Reject if not valuable β
βββββββββββββββββββββββββββββββ
The Critical Difference: Recency vs Frequency π―
The fundamental distinction between these algorithms is what they optimize for:
LRU optimizes for: RECENCY (when was it last used?)
TinyLFU optimizes for: FREQUENCY (how often is it used?)
This difference seems subtle but has massive implications for real-world systems.
Visual Comparison π
LRU Behavior
Timeline of Cache Operations:
Initial: [A] β β [B] β β [C]
β β
MRU LRU
Access B: [B] β β [A] β β [C]
Add D (full): [D] β β [B] β β [A]
β β
MRU Evicted β C
Result: C removed (oldest), regardless of access count
TinyLFU Behavior
Cache State:
Item A: frequency = 500
Item B: frequency = 300
Item C: frequency = 100 β LFU victim
New Item D arrives (frequency = 5)
β
Compare: D(5) vs C(100)
β
Decision: REJECT β
β
Result: D never enters cache
C stays (more valuable)
New Item E arrives (frequency = 150)
β
Compare: E(150) vs C(100)
β
Decision: ADMIT β
β
Result: E enters cache
C evicted
The Real-World Problem: JWT Authentication π
Let's examine a production scenario that perfectly illustrates why the choice matters. This example is drawn from real JWT authentication systems handling 100,000 requests per second.
The Setup
You're running an auth gateway. Every incoming request carries a JWT. Verifying that JWT costs 200 microseconds. At 100,000 requests per second, that's 20 full CPU cores doing nothing but crypto.
So you add caching. Problem solved... until it isn't.
Your Traffic Pattern
π₯ Hot Tokens (70% of traffic)
ββ Long-lived sessions
ββ Service accounts: 20 req/sec each
ββ Browser sessions: 50 req/min each
ββ Thousands of hits per 20-min lifetime
βοΈ Cold Tokens (25% of traffic)
ββ One-off API calls
ββ Batch jobs with unique tokens
ββ Each token: 1-2 hits, never again
π² Noise Tokens (5% of traffic)
ββ Scanners
ββ Malformed requests
ββ Expired tokens from broken clients
ββ Pure garbage
The LRU Failure Mode
Time: 3:00 AM
Event: Batch job spawns 1000 workers
Each worker: unique JWT, single request
LRU Decision Tree:
βββββββββββββββββββββββββββββββ
β New token arrives β
β Cache is full β
ββββββββββββ¬βββββββββββββββββββ
β
βΌ
ββββββββββββββββ
β Admit token β β Always admits
β Evict oldest β β Evicts hot tokens
ββββββββββββββββ
β
βΌ
βββββββββββββββββββββββββββββββ
β Cache now full of 1000 β
β tokens that will NEVER β
β be seen again β
βββββββββββββββββββββββββββββββ
Impact:
ββ Service account tokens evicted
ββ Active browser sessions evicted
ββ Hit rate: 92% β 61%
ββ CPU usage doubles
As the author of the TinyLFU JWT authentication analysis notes, this scenario transforms a stable caching system into an unpredictable performance disaster. Hot tokens that should remain cached for their entire 20-minute lifetime get evicted by cold tokens that will never be seen again.
The TinyLFU Solution
Same scenario: 1000 batch job tokens arrive
TinyLFU Decision for Each Token:
βββββββββββββββββββββββββββββββ
β New batch token β
β Frequency: 1 β
ββββββββββββ¬βββββββββββββββββββ
β
βΌ
ββββββββββββββββ
β Compare with β
β LFU victim β
β (freq: 500) β
ββββββββ¬ββββββββ
β
βΌ
ββββββββββββββββ
β 1 < 500 β
β REJECT β β
ββββββββββββββββ
β
βΌ
βββββββββββββββββββββββββββββββ
β Token never enters cache β
β Hot tokens remain cached β
β Hit rate stays: 92% β
βββββββββββββββββββββββββββββββ
TinyLFU's admission gate prevents cache pollution. Batch tokens never achieve sufficient frequency to compete with legitimate hot tokens.
Data Structure Comparison ποΈ
LRU Implementation
Components:
βββββββββββββββββββββββββββββββββββ
β HashMap: key β node β
β (O(1) lookups) β
βββββββββββββββββββββββββββββββββββ
β
βΌ
βββββββββββββββββββββββββββββββββββ
β Doubly Linked List β
β (maintains access order) β
β β
β [Head] β β [Node] β β [Tail] β
β (MRU) (LRU) β
βββββββββββββββββββββββββββββββββββ
Operations:
Get: HashMap lookup + move to head = O(1)
Set: HashMap insert + add to head = O(1)
(evict tail if full)
Memory: ~100 bytes/entry overhead
TinyLFU Implementation
Components:
βββββββββββββββββββββββββββββββββββ
β Main Cache (LRU) β
β 99% of capacity β
βββββββββββββββββββββββββββββββββββ
β
βΌ
βββββββββββββββββββββββββββββββββββ
β Admission Window (tiny LRU) β
β 1% of capacity β
β (for cold-start protection) β
βββββββββββββββββββββββββββββββββββ
β
βΌ
βββββββββββββββββββββββββββββββββββ
β Count-Min Sketch β
β (frequency tracking) β
β 10x cache size β
β ~4 bits per counter β
βββββββββββββββββββββββββββββββββββ
Operations:
Get: Increment sketch + cache lookup = O(1)
Set: Compare frequencies + conditional admit = O(1)
Memory: ~500 KB for 100k cache with 1M counters
Count-Min Sketch: The Magic Behind TinyLFU π©
The Count-Min Sketch is the secret sauce that makes TinyLFU practical. It's a probabilistic data structure that tracks frequencies for millions of items using only a tiny amount of memory.
The Problem It Solves
Imagine you need to track access frequencies for every JWT token you've ever seen:
Naive Approach:
HashMap: token β counter
Problems:
ββ Need to store every unique token
ββ 1 million tokens Γ 64 bytes = 64 MB just for keys
ββ Plus counter storage
ββ Total: ~100 MB+ for frequency tracking alone β
Count-Min Sketch solves this with a clever tradeoff: instead of exact counts, it provides approximate counts with strong guarantees.
How Count-Min Sketch Works
Think of it as a 2D grid of counters with multiple hash functions:
Count-Min Sketch Structure:
Hash1 β [ 3][ 1][ 5][ 2][ 8][ 0][ 4]...
Hash2 β [ 2][ 4][ 1][ 7][ 3][ 1][ 2]...
Hash3 β [ 1][ 2][ 6][ 3][ 1][ 4][ 5]...
Hash4 β [ 4][ 1][ 3][ 5][ 2][ 1][ 8]...
β
Counters (4 bits each)
Dimensions:
ββ Width (w): Number of counters per row
ββ Depth (d): Number of hash functions
ββ Total memory: w Γ d Γ 4 bits
Operations Visualized
Incrementing a Token:
Token: "jwt_user_123"
β
Apply 4 hash functions:
ββ Hash1(token) = 42 β Increment row1[42]
ββ Hash2(token) = 157 β Increment row2[157]
ββ Hash3(token) = 891 β Increment row3[891]
ββ Hash4(token) = 23 β Increment row4[23]
Before:
Row1: ...[ 3][ 5]... (position 42)
Row2: ...[10][ 2]... (position 157)
Row3: ...[ 7][ 1]... (position 891)
Row4: ...[ 2][ 8]... (position 23)
After:
Row1: ...[ 4][ 5]... β Incremented
Row2: ...[11][ 2]... β Incremented
Row3: ...[ 8][ 1]... β Incremented
Row4: ...[ 3][ 8]... β Incremented
Time: O(1) - Just 4 increments
Querying Frequency:
Token: "jwt_user_123"
β
Apply same 4 hash functions:
ββ Hash1(token) = 42 β Read row1[42] = 4
ββ Hash2(token) = 157 β Read row2[157] = 11
ββ Hash3(token) = 891 β Read row3[891] = 8
ββ Hash4(token) = 23 β Read row4[23] = 3
Estimated frequency: min(4, 11, 8, 3) = 3 β
Why minimum?
ββ Counters can be inflated by collisions
Take the minimum = most conservative estimate
Time: O(1) - Just 4 lookups + min operation
Why It Works: Hash Collisions
Different tokens can hash to the same positions (collisions), but the Count-Min Sketch handles this elegantly:
Collision Example:
Token A and Token B both hash to position 42 in Row1
Row1[42] stores: countA + countB = 15
β
When querying Token A:
ββ Row1: 15 (includes Token B) β Overestimate
ββ Row2: 12 (includes Token C) β Overestimate
ββ Row3: 9 (no collision) β True count
ββ Row4: 11 (includes Token D) β Overestimate
Result: min(15, 12, 9, 11) = 9
Key Property:
ββ Estimates can be HIGHER than true count
ββ Estimates are NEVER LOWER than true count
ββ Using multiple rows reduces overestimation
The Guarantee
Count-Min Sketch provides mathematical guarantees:
True frequency: f
Estimated frequency: f'
Guarantee: f β€ f' β€ f + Ξ΅ Γ N
Where:
ββ Ξ΅ (epsilon): Error parameter (e.g., 0.01 = 1%)
ββ N: Total number of items seen
ββ f' never underestimates
Example with Ξ΅ = 0.01:
ββ True count: 100
ββ Total items: 1,000,000
ββ Max error: 0.01 Γ 1,000,000 = 10,000
ββ Estimated: 100 to 10,100
For TinyLFU caching decisions, this is perfect because we're making relative comparisons:
Admission Decision:
Token A frequency: ~500 (might be 500-600)
Token B frequency: ~50 (might be 50-100)
Decision: A > B is correct regardless of small errors β
Memory Efficiency
Compare memory usage for tracking 1 million tokens:
Exact HashMap:
ββ 1M entries Γ 64 bytes (key)
ββ 1M entries Γ 8 bytes (counter)
ββ Total: ~72 MB
Count-Min Sketch (w=10M, d=4):
ββ 10M counters Γ 4 rows
ββ 40M counters Γ 4 bits each
ββ 160M bits = 20 MB
ββ 72% memory savings β
Even Better - 4-bit Counters:
ββ Each counter: 0 to 15
ββ When counter reaches 15, halve all counters
ββ Maintains relative frequencies
ββ Prevents overflow
Counter Decay in Action
TinyLFU uses periodic decay to keep frequencies fresh:
Before Decay:
Row1: [15][12][ 8][15]...
Row2: [13][15][ 6][ 9]...
Row3: [15][ 7][11][15]...
Row4: [10][15][15][ 4]...
Decay Event (right shift by 1 bit = divide by 2):
Row1: [ 7][ 6][ 4][ 7]...
Row2: [ 6][ 7][ 3][ 4]...
Row3: [ 7][ 3][ 5][ 7]...
Row4: [ 5][ 7][ 7][ 2]...
Benefits:
ββ Recent activity matters more
ββ Old patterns fade away
ββ Prevents counter overflow
ββ Extremely fast (bit shift operation)
Real-World Example: JWT Cache
Scenario: 100k req/sec, tracking 5M unique tokens
Configuration:
ββ Width: 10M counters
ββ Depth: 4 hash functions
ββ Counter size: 4 bits
ββ Total memory: 20 MB
Token Types:
βββββββββββββββββββββββββββββββββββββββ
β Service Account (Token A) β
β True frequency: 2000 β
β Sketch estimate: 2003 β
β Error: 0.15% β
βββββββββββββββββββββββββββββββββββββββ
βββββββββββββββββββββββββββββββββββββββ
β Batch Job Token (Token B) β
β True frequency: 1 β
β Sketch estimate: 1 β
β Error: 0% β
βββββββββββββββββββββββββββββββββββββββ
βββββββββββββββββββββββββββββββββββββββ
β Random Attack Token (Token C) β
β True frequency: 1 β
β Sketch estimate: 2 (collision) β
β Error: 100% (but still tiny!) β
βββββββββββββββββββββββββββββββββββββββ
Admission Decision:
ββ Token A (2003) vs Token B (1) β A wins β
ββ Token A (2003) vs Token C (2) β A wins β
ββ Small errors don't affect decisions
Why This Matters for TinyLFU
The Count-Min Sketch enables TinyLFU to:
- Track millions of items with minimal memory
- Make instant decisions with O(1) operations
- Handle high throughput without becoming a bottleneck
- Adapt to changing patterns through decay
- Tolerate approximation because relative comparisons still work
Without Count-Min Sketch, TinyLFU would need exact frequency counters for every item ever seen, making it impractical for high-scale systems. The sketch transforms TinyLFU from a theoretical idea into a production-ready cache algorithm.
The Trade-off:
ββ Give up: Exact counts
ββ Gain: 70-90% memory savings
ββ Guarantee: Never underestimate
ββ Result: Practical frequency-based caching β
Attack Resilience π‘οΈ
LRU Under Attack
Attacker Strategy: 50k random JWTs/sec
LRU Response:
βββββββββββββββββββββββββββββββββββ
β Random Token 1 β ADMIT β
β Random Token 2 β ADMIT β
β Random Token 3 β ADMIT β
β ... (evicting legitimate tokens)β
β Random Token 50000 β ADMIT β
βββββββββββββββββββββββββββββββββββ
Result:
ββ Cache fills with garbage
ββ Legitimate tokens evicted
ββ Hit rate collapses
ββ CPU usage spikes
TinyLFU Under Attack
Same Attack: 50k random JWTs/sec
TinyLFU Response:
βββββββββββββββββββββββββββββββββββ
β Random Token 1 (freq=1) β REJECTβ
β Random Token 2 (freq=1) β REJECTβ
β Random Token 3 (freq=1) β REJECTβ
β ... (legitimate tokens safe) β
β Random Token 50000 β REJECT β
βββββββββββββββββββββββββββββββββββ
Meanwhile:
ββ Legitimate Token A (freq=2000) β stays cached
ββ Service Account B (freq=1500) β stays cached
ββ Hit rate remains: 92%
As demonstrated in the JWT authentication case study, the attack actually makes TinyLFU's cache more resilient because random tokens can never achieve the frequency needed to displace legitimate hot tokens.
Configuration Deep Dive βοΈ
LRU Configuration
Simple Parameters:
cache := NewLRU(100000) // Just capacity
Optional:
ββ TTL per entry
ββ Expiration callback
ββ Memory limit (bytes)
Rule of Thumb:
Size = ActiveUsers Γ AverageSessionRequests Γ SafetyMargin
= 10,000 Γ 50 Γ 2
= 1,000,000 entries
TinyLFU Configuration
Critical Parameters:
cache := NewTinyLFU(TinyLFUConfig{
MaxSize: 100000, // Main cache capacity
NumCounters: 1000000, // Frequency sketch size (10x)
BufferItems: 1000, // Admission window (1%)
})
Parameter Breakdown:
NumCounters:
ββ Rule: 10x cache size
ββ Why: Reduces hash collisions
ββ Memory: counters Γ 4 bits
ββ Example: 1M counters = 500 KB
BufferItems:
ββ Rule: 1% of MaxSize
ββ Purpose: Cold-start protection
ββ Behavior: New items land here first
ββ Graduation: Build frequency β main cache
MaxCost (optional):
ββ For variable-sized entries
ββ Memory-bounded eviction
ββ Example: 50 MB budget for mixed sizes
Conclusion π―
Both TinyLRU and TinyLFU have their place in modern systems:
TinyLRU is your default choice. It's simple, fast, and works well for most caching scenarios. Use it when recency correlates with value in your workload.
TinyLFU is your specialist tool. It shines when you have frequency-skewed workloads, face burst traffic, or need resilience against cache pollution. The JWT authentication scenario demonstrates how critical this becomes at scale where poor cache decisions directly translate to wasted CPU cores and degraded latency.
The right choice depends on understanding your access patterns. Monitor your cache behavior, measure hit rates under realistic load, and choose the algorithm that matches your workload characteristics.
Further Reading:
- π TinyLFU: A Highly Efficient Cache Admission Policy
- π TinyLFU in JWT Authentication Systems
- π§ Ristretto: A fast, concurrent TinyLFU cache
- π golang-lru: Simple LRU implementation
- π Count-Min Sketch: The probabilistic data structure powering TinyLFU
Choose wisely, cache efficiently, and may your hit rates stay high! π
Top comments (0)