You're running an auth gateway. Every incoming request carries a JWT. Verifying that JWT checking the signature, validating claims, maybe hitting a revocation list costs you 200 microseconds. At 100,000 requests per second, that's 20 full CPU cores doing nothing but crypto. 💸
So you cache verified tokens. Problem solved. ✅
Except three weeks later, your P99 latency spikes during a product launch. Cache hit rate drops from 92% to 61%. CPU usage doubles. You scale horizontally, and the problem... gets worse. 📉
This isn't a configuration issue. It's not a bug. It's LRU doing exactly what it was designed to do, which happens to be the wrong thing for your workload.
The Problem With LRU in Auth Systems 🚨
LRU (Least Recently Used) evicts whatever you touched longest ago. In theory, this keeps "hot" data in cache. In practice, at high throughput with diverse traffic patterns, LRU makes decisions that actively hurt you.
Here's why: your JWT traffic isn't uniform. You have three categories of tokens hitting your gateway:
Your Token Traffic Breakdown 📊
🔥 Hot Tokens (70% of traffic)
└─ Long-lived sessions
└─ Browsers, mobile apps, service accounts
└─ Single user = 50 req/min
└─ Service account = 20 req/sec
└─ Thousands of hits per 20-min lifetime
❄️ Cold Tokens (25% of traffic)
└─ One-off API calls
└─ Scheduled jobs
└─ Infrequent users
└─ Each token appears 1-2 times, never again
🎲 Random Tokens (5% of traffic)
└─ Scanners
└─ Malformed requests
└─ Expired tokens from broken clients
└─ Attackers probing endpoints
└─ Pure noise
The LRU Failure Mode:
Time: 3:00 AM
Event: Batch job spawns 1000 workers
Each worker: unique JWT, single request
LRU Decision Tree:
┌─────────────────────────────┐
│ New token arrives │
│ Cache is full │
└──────────┬──────────────────┘
│
▼
┌──────────────┐
│ Admit token │ ← Always admits
│ Evict oldest │ ← Evicts your hot tokens
└──────────────┘
│
▼
┌─────────────────────────────┐
│ Result: Cache now full of │
│ 1000 tokens that will │
│ NEVER be seen again │
└─────────────────────────────┘
Now your hot tokens miss the cache. You're verifying the same legitimate user tokens over and over, burning CPU, while your cache is full of one-time tokens that will never be seen again.
Your hit rate collapses. Your latency spikes. And because this happens gradually as the cache churns, you don't see a clean error you see slow, expensive drift. 🐌
The Core Problem 🎯
LRU optimizes for recency, but recency is the wrong signal when traffic has massive frequency skew.
What TinyLFU Actually Changes 🔄
TinyLFU splits caching into two distinct decisions: admission and eviction.
LRU vs TinyLFU Flow
Traditional LRU:
Token arrives → Miss → ALWAYS admit → Evict something old
↓
No questions asked
TinyLFU:
Token arrives → Miss → Is this token valuable enough?
├─ Yes → Admit & evict
└─ No → Reject, verify once, move on
TinyLFU adds a gate. When a JWT arrives and misses the cache, TinyLFU asks:
"Is this token more valuable than what I'd have to evict?"
If the answer is no, the token doesn't enter the cache at all. You verify it, return the response, and move on.
The Signal Shift 📡
- LRU measures: Recency (when was it last accessed?)
- TinyLFU measures: Frequency (how often has it been accessed?)
For auth workloads, frequency is the correct signal:
Token A: 5000 accesses > Token B: 1 access
↑ ↑
More valuable Less valuable
(regardless of which one you saw most recently)
TinyLFU Without Jargon 🧮
TinyLFU keeps approximate frequency counts for every item it's seen recently. Not exact counts approximate. This matters for performance and memory.
The Admission Algorithm
JWT arrives
↓
1. Increment counter for this token
↓
2. Already in cache? → HIT ✅ (done)
↓
3. Not in cache? → MISS
↓
4. Compare frequencies:
┌─────────────────────────────────────┐
│ New token frequency: 3 │
│ Victim token frequency: 500 │
│ │
│ Decision: REJECT (3 < 500) │
│ Action: Verify once, don't cache │
└─────────────────────────────────────┘
OR
┌─────────────────────────────────────┐
│ New token frequency: 520 │
│ Victim token frequency: 480 │
│ │
│ Decision: ADMIT (520 > 480) │
│ Action: Cache new, evict victim │
└─────────────────────────────────────┘
Counter Decay ⏰
Counters decay over time to keep frequency reflecting recent history:
Counter Values Before Decay:
Token A: 1000
Token B: 500
Token C: 200
[Decay event: halve all counters]
Counter Values After Decay:
Token A: 500
Token B: 250
Token C: 100
Why this matters: A token that was hot yesterday but hasn't been seen in an hour shouldn't still dominate. Decay is fast (just bit-shifting memory) and keeps the frequency signal fresh.
At 100k TPS, "recent history" typically means the last few seconds to a minute. That's enough to distinguish genuinely hot tokens from noise, but short enough that stale patterns don't stick around.
Applying TinyLFU to a JWT Cache 🎯
Let's trace what happens to each category of token:
🔥 Hot Tokens (Browsers, Service Accounts)
Timeline of a Browser Token:
Request #1 → Miss, frequency=1 → Verify, reject admission
Request #2 → Miss, frequency=2 → Verify, reject admission
Request #3 → Miss, frequency=3 → Verify, reject admission
...
Request #8 → Miss, frequency=8 → Verify, WIN admission ✅
Request #9 → HIT ✅
Request #10 → HIT ✅
Request #50 → HIT ✅ (frequency=50+)
...
Request #3000 → HIT ✅ (frequency=3000+)
Result: Cached for entire 20-min lifetime
Hot tokens naturally win admission after a few misses and stay cached because every access keeps their frequency elevated. 🎉
❄️ Cold Tokens (One-off Requests)
Timeline of a Batch Job Token:
Request #1 → Miss, frequency=1 → Verify
↓
Compare for admission:
New token frequency: 1
Victim frequency: 500
Decision: REJECT ❌
↓
Cache unchanged
↓
Token never seen again
Result: Never pollutes the cache
Cold tokens never build enough frequency to compete. They bounce off the admission gate. ✋
🎲 Random/Attack Tokens
Attacker floods 50k requests/sec with random JWTs:
Random Token 1: frequency=1 → REJECT ❌
Random Token 2: frequency=1 → REJECT ❌
Random Token 3: frequency=1 → REJECT ❌
...
Random Token 50000: frequency=1 → REJECT ❌
Meanwhile:
Legitimate Token A: frequency=2000 → ADMIT ✅ (stays cached)
Attackers can't displace your hot tokens because random tokens never achieve sufficient frequency. The attack makes your cache MORE resilient, not less. 🛡️
The Key Insight 💡
TinyLFU doesn't need to be told what's hot. It infers value from access patterns:
- Service account polling every 100ms → naturally high frequency → stays cached
- Batch job with 1000 one-time tokens → 1000 low-frequency items → never admitted
The policy emerges from the data.
Admission Decisions in Practice ⚖️
Scenario: Cache holds 100k tokens. It's full. New JWT arrives.
┌─────────────────────────────────────────┐
│ New JWT arrives (miss) │
│ Token: eyJhbGc... │
└──────────────┬──────────────────────────┘
│
▼
┌─────────────────────────────────────────┐
│ Lookup frequency │
│ New token: 3 hits │
│ LFU victim (least frequent cached): 480 │
└──────────────┬──────────────────────────┘
│
▼
┌─────────────────────────────────────────┐
│ 3 < 480 │
│ Decision: REJECT ❌ │
│ Action: Verify token, return response │
│ Cache remains unchanged │
└─────────────────────────────────────────┘
Another token arrives:
┌─────────────────────────────────────────┐
│ New JWT arrives (miss) │
└──────────────┬──────────────────────────┘
│
▼
┌─────────────────────────────────────────┐
│ Lookup frequency │
│ New token: 520 hits │
│ LFU victim: 480 │
└──────────────┬──────────────────────────┘
│
▼
┌─────────────────────────────────────────┐
│ 520 > 480 │
│ Decision: ADMIT ✅ │
│ Action: Evict victim, cache new token │
└─────────────────────────────────────────┘
This comparison happens on every miss. It's fast TinyLFU uses a probabilistic data structure (Count-Min Sketch) to track frequencies in O(1) time with tiny memory footprint.
Why This Matters Under Attack 🛡️
Attack scenario: Adversary sends 50k req/sec with random JWTs
| Cache Type | Result |
|---|---|
| LRU | ❌ Random tokens churn cache, evict legitimate tokens, cache hit rate collapses |
| TinyLFU | ✅ Each random token has frequency=1, legitimate tokens have frequency=100s-1000s, random tokens never win admission, cache remains stable |
Burst scenario: Batch job spawns 5000 workers at midnight
| Cache Type | Result |
|---|---|
| LRU | ❌ 5000 one-time tokens flush out service accounts and browser sessions |
| TinyLFU | ✅ Batch tokens never achieve sufficient frequency, hot tokens remain cached |
Why Configuration Matters ⚙️
TinyLFU has a few knobs. They're not arbitrary, they map directly to your workload.
🎛️ NumCounters
What it controls: Size of the frequency sketch
Rule of thumb: 10x your cache size
Example:
Cache capacity: 100k tokens
NumCounters: 1 million
Why?
├─ Sketch uses hash functions to map tokens → counters
├─ Too few counters = frequent collisions
│ └─ Different tokens share same counter
│ └─ Frequency estimates degrade
└─ Enough counters = rare collisions = accurate tracking
Memory cost: 1M counters × 4 bits = 500 KB ✅
At 100k TPS with 20-min token lifetimes:
- You might see 5 million distinct tokens in an hour
- Using 1-2M counters gives low collision rates
- Total memory cost: ~1 MB (cheap!)
🎛️ BufferItems
What it is: TinyLFU's admission buffer (small LRU window)
Typical size: 1% of total cache size
Why it exists: Protects against cold-start problem
Problem without buffer:
New token arrives → frequency=0 → loses every admission battle
Solution with buffer:
New token arrives → lands in admission buffer
→ builds frequency over next few hits
→ graduates to main cache once proven
Critical for bursts:
1000 browser clients log in simultaneously
↓
Their tokens start cold (frequency=0)
↓
Buffer holds them while they accumulate hits
↓
Once they've proven frequency → graduate to main cache
🎛️ MaxCost
What it controls: Memory-bounded eviction
When it matters: Variable-sized cache entries
Scenario 1: Uniform token size
├─ Most JWTs: ~500 bytes
├─ MaxCost = 100k tokens × 500 bytes = 50 MB
└─ Simple: count-based eviction works fine
Scenario 2: Variable token size + metadata
├─ Small JWT: 300 bytes
├─ JWT + user roles: 2 KB
├─ JWT + org context: 10 KB
├─ MaxCost = 50 MB total budget
└─ Cost-based eviction keeps memory bounded
(might cache 100k small tokens OR 5k large ones)
For most auth caches, token size is uniform, so MaxCost is just token_count × avg_size. But if you're caching metadata with each token, costs matter.
What TinyLFU Does Not Guarantee ⚠️
❌ Eventual Consistency of Set
Your code:
cache.Set("token123", verifiedJWT)
value := cache.Get("token123") // Might be nil!
Why? Admission can fail.
TinyLFU might drop your Set on the floor if admission fails. This is by design a tradeoff for resilience.
In practice, this doesn't matter for auth:
- JWTs are ephemeral
- Verification is deterministic
- If a token misses cache → verify it → return response
- User doesn't know the difference
- Over a few seconds, hot tokens accumulate frequency and stabilize
The system converges to the right steady state. ✅
📊 Approximate Counts
Count-Min Sketch uses hash collisions:
- Counts can be overestimated
- Counts are never underestimated
True frequency: 100
Reported frequency: 105 or 110 (due to collision)
Impact: None
└─ You're making relative comparisons
└─ Error is small and consistent
└─ Doesn't affect admission quality
⏰ No Explicit TTL Tracking
TinyLFU doesn't track TTLs that happens independently.
Your auth cache setup:
├─ TinyLFU: controls admission + frequency-based eviction
└─ TTL: handles expiration (20 min = JWT lifetime)
Result:
├─ Tokens naturally age out via TTL
└─ TinyLFU ensures what stays cached is worth keeping
How to tell if you need it:
Have you ever seen your cache hit rate drop during traffic bursts, despite having enough memory? That's LRU churning under cold traffic. TinyLFU fixes it. 🔧
Final Mental Model 🧠
Think of TinyLFU as a nightclub bouncer. 🕴️
LRU Bouncer 👎
Policy: Let everyone in, kick out whoever's been
standing around longest
Bus of tourists arrives
↓
All tourists get in
↓
Regulars get kicked out to make room
↓
Tourists leave after 5 minutes
↓
Regulars are outside
↓
Club full of strangers who don't want to be there
TinyLFU Bouncer 👍
Policy: Check how many times you've visited this month
Regulars get in. Tourists stay out.
Bus of tourists arrives
↓
Bouncer checks visit history:
├─ Regulars: 50+ visits → Come in ✅
└─ Tourists: 0 visits → Sorry, no room ❌
↓
Even if there's technically space, letting tourists
in means kicking out regulars
↓
Club stays full of people who actually want to be there
For a JWT Auth Cache 🔐
🔥 "Regulars" = Active user sessions, service accounts
❄️ "Tourists" = One-off tokens, batch jobs, scanner garbage
TinyLFU keeps regulars in and tourists out.
The result:
- ✅ Stable hit rates
- ✅ Predictable CPU usage
- ✅ Cache does what you wanted: keep hot stuff hot, let cold stuff stay cold
TL;DR 📝
TinyLFU isn't magic. It's a policy that matches the reality of how traffic actually behaves.
LRU: Designed for workloads where recency = value
TinyLFU: Designed for workloads where frequency = value
In modern high-throughput systems, frequency equals value. TinyLFU makes that tradeoff explicit, and your cache gets better as a result.
If you're running an auth service at scale, it's worth the switch. 🚀
Further Reading:
- 📄 Original TinyLFU paper: "TinyLFU: A Highly Efficient Cache Admission Policy"
- 🔧 Popular implementation:
github.com/dgraph-io/ristretto - 📊 Count-Min Sketch: The probabilistic data structure behind frequency tracking

Top comments (0)