DEV Community

Akarshan Gandotra
Akarshan Gandotra

Posted on

TinyLFU: Why Your JWT Auth Cache Needs Better Eviction 🔐

You're running an auth gateway. Every incoming request carries a JWT. Verifying that JWT checking the signature, validating claims, maybe hitting a revocation list costs you 200 microseconds. At 100,000 requests per second, that's 20 full CPU cores doing nothing but crypto. 💸

So you cache verified tokens. Problem solved. ✅

Except three weeks later, your P99 latency spikes during a product launch. Cache hit rate drops from 92% to 61%. CPU usage doubles. You scale horizontally, and the problem... gets worse. 📉

This isn't a configuration issue. It's not a bug. It's LRU doing exactly what it was designed to do, which happens to be the wrong thing for your workload.

The Problem With LRU in Auth Systems 🚨

LRU (Least Recently Used) evicts whatever you touched longest ago. In theory, this keeps "hot" data in cache. In practice, at high throughput with diverse traffic patterns, LRU makes decisions that actively hurt you.

Here's why: your JWT traffic isn't uniform. You have three categories of tokens hitting your gateway:

Your Token Traffic Breakdown 📊

🔥 Hot Tokens (70% of traffic)
   └─ Long-lived sessions
   └─ Browsers, mobile apps, service accounts
   └─ Single user = 50 req/min
   └─ Service account = 20 req/sec
   └─ Thousands of hits per 20-min lifetime

❄️  Cold Tokens (25% of traffic)
   └─ One-off API calls
   └─ Scheduled jobs
   └─ Infrequent users
   └─ Each token appears 1-2 times, never again

🎲 Random Tokens (5% of traffic)
   └─ Scanners
   └─ Malformed requests
   └─ Expired tokens from broken clients
   └─ Attackers probing endpoints
   └─ Pure noise
Enter fullscreen mode Exit fullscreen mode

The LRU Failure Mode:

Time: 3:00 AM
Event: Batch job spawns 1000 workers
Each worker: unique JWT, single request

LRU Decision Tree:
┌─────────────────────────────┐
│ New token arrives           │
│ Cache is full              │
└──────────┬──────────────────┘
           │
           ▼
    ┌──────────────┐
    │ Admit token  │  ← Always admits
    │ Evict oldest │  ← Evicts your hot tokens
    └──────────────┘
           │
           ▼
┌─────────────────────────────┐
│ Result: Cache now full of   │
│ 1000 tokens that will       │
│ NEVER be seen again         │
└─────────────────────────────┘
Enter fullscreen mode Exit fullscreen mode

Now your hot tokens miss the cache. You're verifying the same legitimate user tokens over and over, burning CPU, while your cache is full of one-time tokens that will never be seen again.

Your hit rate collapses. Your latency spikes. And because this happens gradually as the cache churns, you don't see a clean error you see slow, expensive drift. 🐌

The Core Problem 🎯

LRU optimizes for recency, but recency is the wrong signal when traffic has massive frequency skew.

What TinyLFU Actually Changes 🔄

TinyLFU splits caching into two distinct decisions: admission and eviction.

LRU vs TinyLFU Flow

Traditional LRU:

Token arrives → Miss → ALWAYS admit → Evict something old
                                           ↓
                                    No questions asked
Enter fullscreen mode Exit fullscreen mode

TinyLFU:

Token arrives → Miss → Is this token valuable enough?
                              ├─ Yes → Admit & evict
                              └─ No  → Reject, verify once, move on
Enter fullscreen mode Exit fullscreen mode

TinyLFU adds a gate. When a JWT arrives and misses the cache, TinyLFU asks:

"Is this token more valuable than what I'd have to evict?"

If the answer is no, the token doesn't enter the cache at all. You verify it, return the response, and move on.

The Signal Shift 📡

  • LRU measures: Recency (when was it last accessed?)
  • TinyLFU measures: Frequency (how often has it been accessed?)

For auth workloads, frequency is the correct signal:

Token A: 5000 accesses  >  Token B: 1 access
    ↑                           ↑
  More valuable              Less valuable

(regardless of which one you saw most recently)
Enter fullscreen mode Exit fullscreen mode

TinyLFU Without Jargon 🧮

TinyLFU keeps approximate frequency counts for every item it's seen recently. Not exact counts approximate. This matters for performance and memory.

The Admission Algorithm

JWT arrives
    ↓
1. Increment counter for this token
    ↓
2. Already in cache? → HIT ✅ (done)
    ↓
3. Not in cache? → MISS
    ↓
4. Compare frequencies:
   ┌─────────────────────────────────────┐
   │ New token frequency: 3              │
   │ Victim token frequency: 500         │
   │                                     │
   │ Decision: REJECT (3 < 500)          │
   │ Action: Verify once, don't cache    │
   └─────────────────────────────────────┘

   OR

   ┌─────────────────────────────────────┐
   │ New token frequency: 520            │
   │ Victim token frequency: 480         │
   │                                     │
   │ Decision: ADMIT (520 > 480)         │
   │ Action: Cache new, evict victim     │
   └─────────────────────────────────────┘
Enter fullscreen mode Exit fullscreen mode

Counter Decay ⏰

Counters decay over time to keep frequency reflecting recent history:

Counter Values Before Decay:
Token A: 1000
Token B: 500
Token C: 200

[Decay event: halve all counters]

Counter Values After Decay:
Token A: 500
Token B: 250
Token C: 100
Enter fullscreen mode Exit fullscreen mode

Why this matters: A token that was hot yesterday but hasn't been seen in an hour shouldn't still dominate. Decay is fast (just bit-shifting memory) and keeps the frequency signal fresh.

At 100k TPS, "recent history" typically means the last few seconds to a minute. That's enough to distinguish genuinely hot tokens from noise, but short enough that stale patterns don't stick around.

Applying TinyLFU to a JWT Cache 🎯

Let's trace what happens to each category of token:

🔥 Hot Tokens (Browsers, Service Accounts)

Timeline of a Browser Token:

Request #1  → Miss, frequency=1   → Verify, reject admission
Request #2  → Miss, frequency=2   → Verify, reject admission
Request #3  → Miss, frequency=3   → Verify, reject admission
...
Request #8  → Miss, frequency=8   → Verify, WIN admission ✅
Request #9  → HIT ✅
Request #10 → HIT ✅
Request #50 → HIT ✅ (frequency=50+)
...
Request #3000 → HIT ✅ (frequency=3000+)

Result: Cached for entire 20-min lifetime
Enter fullscreen mode Exit fullscreen mode

Hot tokens naturally win admission after a few misses and stay cached because every access keeps their frequency elevated. 🎉

❄️ Cold Tokens (One-off Requests)

Timeline of a Batch Job Token:

Request #1 → Miss, frequency=1 → Verify
             ↓
         Compare for admission:
         New token frequency: 1
         Victim frequency: 500
         Decision: REJECT ❌
             ↓
         Cache unchanged
             ↓
         Token never seen again

Result: Never pollutes the cache
Enter fullscreen mode Exit fullscreen mode

Cold tokens never build enough frequency to compete. They bounce off the admission gate. ✋

🎲 Random/Attack Tokens

Attacker floods 50k requests/sec with random JWTs:

Random Token 1: frequency=1 → REJECT ❌
Random Token 2: frequency=1 → REJECT ❌
Random Token 3: frequency=1 → REJECT ❌
...
Random Token 50000: frequency=1 → REJECT ❌

Meanwhile:
Legitimate Token A: frequency=2000 → ADMIT ✅ (stays cached)
Enter fullscreen mode Exit fullscreen mode

Attackers can't displace your hot tokens because random tokens never achieve sufficient frequency. The attack makes your cache MORE resilient, not less. 🛡️

The Key Insight 💡

TinyLFU doesn't need to be told what's hot. It infers value from access patterns:

  • Service account polling every 100ms → naturally high frequency → stays cached
  • Batch job with 1000 one-time tokens → 1000 low-frequency items → never admitted

The policy emerges from the data.

Admission Decisions in Practice ⚖️

Scenario: Cache holds 100k tokens. It's full. New JWT arrives.

┌─────────────────────────────────────────┐
│ New JWT arrives (miss)                  │
│ Token: eyJhbGc...                       │
└──────────────┬──────────────────────────┘
               │
               ▼
┌─────────────────────────────────────────┐
│ Lookup frequency                        │
│ New token: 3 hits                       │
│ LFU victim (least frequent cached): 480 │
└──────────────┬──────────────────────────┘
               │
               ▼
┌─────────────────────────────────────────┐
│ 3 < 480                                 │
│ Decision: REJECT ❌                     │
│ Action: Verify token, return response   │
│         Cache remains unchanged          │
└─────────────────────────────────────────┘
Enter fullscreen mode Exit fullscreen mode

Another token arrives:

┌─────────────────────────────────────────┐
│ New JWT arrives (miss)                  │
└──────────────┬──────────────────────────┘
               │
               ▼
┌─────────────────────────────────────────┐
│ Lookup frequency                        │
│ New token: 520 hits                     │
│ LFU victim: 480                         │
└──────────────┬──────────────────────────┘
               │
               ▼
┌─────────────────────────────────────────┐
│ 520 > 480                               │
│ Decision: ADMIT ✅                      │
│ Action: Evict victim, cache new token   │
└─────────────────────────────────────────┘
Enter fullscreen mode Exit fullscreen mode

This comparison happens on every miss. It's fast TinyLFU uses a probabilistic data structure (Count-Min Sketch) to track frequencies in O(1) time with tiny memory footprint.

Why This Matters Under Attack 🛡️

Attack scenario: Adversary sends 50k req/sec with random JWTs

Cache Type Result
LRU ❌ Random tokens churn cache, evict legitimate tokens, cache hit rate collapses
TinyLFU ✅ Each random token has frequency=1, legitimate tokens have frequency=100s-1000s, random tokens never win admission, cache remains stable

Burst scenario: Batch job spawns 5000 workers at midnight

Cache Type Result
LRU ❌ 5000 one-time tokens flush out service accounts and browser sessions
TinyLFU ✅ Batch tokens never achieve sufficient frequency, hot tokens remain cached

Why Configuration Matters ⚙️

TinyLFU has a few knobs. They're not arbitrary, they map directly to your workload.

🎛️ NumCounters

What it controls: Size of the frequency sketch

Rule of thumb: 10x your cache size

Example:

Cache capacity: 100k tokens
NumCounters: 1 million

Why?
├─ Sketch uses hash functions to map tokens → counters
├─ Too few counters = frequent collisions
│  └─ Different tokens share same counter
│  └─ Frequency estimates degrade
└─ Enough counters = rare collisions = accurate tracking

Memory cost: 1M counters × 4 bits = 500 KB ✅
Enter fullscreen mode Exit fullscreen mode

At 100k TPS with 20-min token lifetimes:

  • You might see 5 million distinct tokens in an hour
  • Using 1-2M counters gives low collision rates
  • Total memory cost: ~1 MB (cheap!)

🎛️ BufferItems

What it is: TinyLFU's admission buffer (small LRU window)

Typical size: 1% of total cache size

Why it exists: Protects against cold-start problem

Problem without buffer:
New token arrives → frequency=0 → loses every admission battle

Solution with buffer:
New token arrives → lands in admission buffer
                 → builds frequency over next few hits
                 → graduates to main cache once proven
Enter fullscreen mode Exit fullscreen mode

Critical for bursts:

1000 browser clients log in simultaneously
    ↓
Their tokens start cold (frequency=0)
    ↓
Buffer holds them while they accumulate hits
    ↓
Once they've proven frequency → graduate to main cache
Enter fullscreen mode Exit fullscreen mode

🎛️ MaxCost

What it controls: Memory-bounded eviction

When it matters: Variable-sized cache entries

Scenario 1: Uniform token size
├─ Most JWTs: ~500 bytes
├─ MaxCost = 100k tokens × 500 bytes = 50 MB
└─ Simple: count-based eviction works fine

Scenario 2: Variable token size + metadata
├─ Small JWT: 300 bytes
├─ JWT + user roles: 2 KB
├─ JWT + org context: 10 KB
├─ MaxCost = 50 MB total budget
└─ Cost-based eviction keeps memory bounded
    (might cache 100k small tokens OR 5k large ones)
Enter fullscreen mode Exit fullscreen mode

For most auth caches, token size is uniform, so MaxCost is just token_count × avg_size. But if you're caching metadata with each token, costs matter.

What TinyLFU Does Not Guarantee ⚠️

❌ Eventual Consistency of Set

Your code:
cache.Set("token123", verifiedJWT)
value := cache.Get("token123")  // Might be nil!

Why? Admission can fail.
Enter fullscreen mode Exit fullscreen mode

TinyLFU might drop your Set on the floor if admission fails. This is by design a tradeoff for resilience.

In practice, this doesn't matter for auth:

  • JWTs are ephemeral
  • Verification is deterministic
  • If a token misses cache → verify it → return response
  • User doesn't know the difference
  • Over a few seconds, hot tokens accumulate frequency and stabilize

The system converges to the right steady state. ✅

📊 Approximate Counts

Count-Min Sketch uses hash collisions:

  • Counts can be overestimated
  • Counts are never underestimated
True frequency: 100
Reported frequency: 105 or 110 (due to collision)

Impact: None
└─ You're making relative comparisons
└─ Error is small and consistent
└─ Doesn't affect admission quality
Enter fullscreen mode Exit fullscreen mode

⏰ No Explicit TTL Tracking

TinyLFU doesn't track TTLs that happens independently.

Your auth cache setup:
├─ TinyLFU: controls admission + frequency-based eviction
└─ TTL: handles expiration (20 min = JWT lifetime)

Result:
├─ Tokens naturally age out via TTL
└─ TinyLFU ensures what stays cached is worth keeping
Enter fullscreen mode Exit fullscreen mode

How to tell if you need it:

Have you ever seen your cache hit rate drop during traffic bursts, despite having enough memory? That's LRU churning under cold traffic. TinyLFU fixes it. 🔧

Final Mental Model 🧠

Think of TinyLFU as a nightclub bouncer. 🕴️

LRU Bouncer 👎

Policy: Let everyone in, kick out whoever's been 
        standing around longest

Bus of tourists arrives
    ↓
All tourists get in
    ↓
Regulars get kicked out to make room
    ↓
Tourists leave after 5 minutes
    ↓
Regulars are outside
    ↓
Club full of strangers who don't want to be there
Enter fullscreen mode Exit fullscreen mode

TinyLFU Bouncer 👍

Policy: Check how many times you've visited this month
        Regulars get in. Tourists stay out.

Bus of tourists arrives
    ↓
Bouncer checks visit history:
├─ Regulars: 50+ visits → Come in ✅
└─ Tourists: 0 visits → Sorry, no room ❌
    ↓
Even if there's technically space, letting tourists 
in means kicking out regulars
    ↓
Club stays full of people who actually want to be there
Enter fullscreen mode Exit fullscreen mode

For a JWT Auth Cache 🔐

🔥 "Regulars" = Active user sessions, service accounts
❄️  "Tourists" = One-off tokens, batch jobs, scanner garbage

TinyLFU keeps regulars in and tourists out.
Enter fullscreen mode Exit fullscreen mode

The result:

  • ✅ Stable hit rates
  • ✅ Predictable CPU usage
  • ✅ Cache does what you wanted: keep hot stuff hot, let cold stuff stay cold

TL;DR 📝

TinyLFU isn't magic. It's a policy that matches the reality of how traffic actually behaves.

LRU:     Designed for workloads where recency = value
TinyLFU: Designed for workloads where frequency = value
Enter fullscreen mode Exit fullscreen mode

In modern high-throughput systems, frequency equals value. TinyLFU makes that tradeoff explicit, and your cache gets better as a result.

If you're running an auth service at scale, it's worth the switch. 🚀


Further Reading:

  • 📄 Original TinyLFU paper: "TinyLFU: A Highly Efficient Cache Admission Policy"
  • 🔧 Popular implementation: github.com/dgraph-io/ristretto
  • 📊 Count-Min Sketch: The probabilistic data structure behind frequency tracking

Top comments (0)