DEV Community

Erich
Erich

Posted on

Cache policy implementations vs the original papers

Cache eviction papers are full of hit rate tables. Running the same workloads against your own code is the fastest way to find out if your implementation actually does what the paper describes, or if you read the algorithm wrong. InterchangeDB has six eviction policies behind a single trait, so I ran all six against five access patterns and compared my numbers to the published results.

The setup

FIFO, Clock, LRU, LRU-K, 2Q, and ARC. Five access patterns. Sequential and Random as baselines, Zipfian with 80/20 skew, CyclicScan for a repeating scan slightly larger than the pool, and ScanPlusHot for a hot working set under scan pressure. Pool size is 64, total pages 256, 100,000 accesses per run.

These run against a SimulatedPool with no disk I/O. The same policies plug into the real buffer pool manager, but for clean comparison the policy logic is isolated from disk noise.

Policy Sequential Random Zipfian CyclicScan Scan+Hot
FIFO 0.0% 24.9% 60.2% 0.0% 16.7%
Clock 0.0% 24.9% 68.6% 0.0% 16.7%
LRU 0.0% 24.9% 69.0% 0.0% 16.7%
LRU-K 0.0% 24.9% 81.3% 0.0% 33.3%
2Q 0.0% 25.1% 77.2% 77.7% 33.3%
ARC 0.0% 24.9% 76.5% 0.0% 33.3%

Sequential and Random are exactly what you'd expect. Sequential is 0% across the board because every access is a new page. Random is 25% across the board because with 64 slots and 256 pages, the ceiling is pool size divided by total pages. Bélády (1966) derived this in his probabilistic model, "the probability of referencing a block in memory is c/s." 64/256 = 0.25. When the access pattern has no structure, no policy can beat coin-flipping.

Zipfian

Zipfian skew is the shape of real OLTP traffic. Trending product catalogs, social media feeds, any workload where a small set of hot items gets most of the reads.

LRU-K wins at 81.3%. Why by such a margin? Its K-th most recent reference acts as a frequency estimator. A page touched once last week and once today looks different from a page touched twice in the last millisecond, and LRU-K can tell them apart. That's 12 percentage points over plain LRU. 2Q was designed as an O(1) approximation of LRU-K (Johnson and Shasha, 1994) and lands at 77.2%, right below it. ARC at 76.5%. The ordering matches what O'Neil et al. (1993) predicted on database buffer pool workloads.

Clock and LRU come in within half a point of each other, 68.6% and 69.0%. Corbató (1968) showed the same thing on Multics. Clock trades a sliver of hit rate for cheaper bookkeeping, which matters when the eviction path runs on every buffer pool access.

When the workload has obvious hot keys, frequency tracking buys you 10+ percentage points over LRU. When the skew isn't there, plain LRU is within a point of the best and costs less to run. If you know your keyspace is skewed and you can afford the bookkeeping, use LRU-K or 2Q. If you don't know, start with LRU.

Scan resistance

Scan-plus-hot is the mixed OLTP-OLAP pattern. A hot transactional working set coexists with reporting queries that scan large ranges. What happens to the hot pages when a scan blows through?

The benchmark fills the cache with 16 hot pages, lets them age into the frequent region of each policy, then runs a one-time scan of 176 cold pages. How many of the original 16 survive?

Policy Survived Survival rate
FIFO 0/16 0%
Clock 0/16 0%
LRU 0/16 0%
LRU-K 16/16 100%
2Q 16/16 100%
ARC 16/16 100%

ARC isolates hot pages in a frequent list that scans never touch. 2Q promotes hot pages to a separate list and evicts from the recent queue first. LRU-K is simpler. Pages with only one access have infinite backward-K distance, so they always evict first.

Theorem 4 in the ARC paper (Megiddo and Modha, 2003) proves this formally. A one-time scan through more novel pages than the cache can hold will not evict pages already in the frequent list.

The overall hit rate across the trace confirms it. Scan-resistant policies hit 33.3%, non-scan-resistant ones hit 16.7%. But hit rate is an average over the entire trace. It won't show you the moment a hot page gets flushed to disk and the next read has to wait for a disk fetch. Survival rate catches what hit rate hides.

If your database runs reporting queries against a live transactional set, you need scan resistance. Without it, a single scan can flush the hot set to disk.

Scan resistance is not cycle resistance

CyclicScan is a deterministic repeating loop through 72 unique pages with a pool of 64. The working set overflows the cache by 12.5% and the same pattern repeats. Recurring batch jobs, scheduled report generation, periodic cache-warming sweeps, anything that touches the same keys in the same order on a schedule.

LRU, FIFO, and Clock all hit 0%. Classic sequential flooding from Stonebraker (1981). The page LRU evicts is exactly the page needed next.

2Q hits 77.7%. Once a page gets touched twice it promotes to the hot list and stays until the hot list overflows. The hot list isn't bounded by the ghost queue, so the stable working set survives repeated scans.

ARC hits 0%. Why?

ARC learns through ghost hits. When a page evicted from the recent list gets accessed again, it shows up in the recent ghost list, and ARC grows the target size for recent pages. Ghost hits are the feedback signal.

But ARC has an invariant. The recent list plus the recent ghost list together hold at most 64 entries, the pool size. When the recent list saturates at 64 during a cold-start scan, the ghost list gets forced to zero. No ghost entries, no ghost hits on the next loop, no adaptation. ARC degenerates to LRU. LRU is 0% on this workload.

2Q doesn't have this coupling. Its ghost queue has an independent size bound. Even while the recent queue fills during the first scan, ghost entries accumulate. Second pass starts, ghosts are still there, pages promote to the hot list, working set survives.

Theorem 4 proves ARC is scan-resistant. It protects an existing hot set from a one-time scan. It does not claim cycle resistance, where the entire access pattern is a repeating loop with no pre-existing hot set to protect. The ghost-list coupling in ARC's invariant means cycle resistance was never on the table.

Scan resistance is not cycle resistance. If your workload has recurring batch jobs that loop through the same pages on a schedule, 2Q is the only policy here that handles it.

Latency

ARC is the fastest policy on three of five workloads despite being the most conceptually complex. The paper describes it as constant-time per request, and the benchmarks agree.

LRU-K is the slowest by a wide margin, 623ms on Random against 38ms for 2Q. Maintaining K-distance history per page costs time. The paper acknowledged this. It's why LRU-K rarely shows up in production despite winning on hit rate.

2Q is the most consistent. Among the two fastest on every workload, top-3 on hit rate on every workload. Never the best at anything, never bad at anything.

What production chose

PostgreSQL's buffer manager walks through nearly every policy in this post. LRU through 7.4. ARC in 8.0, pulled in 8.0.2 over IBM patent concerns. 2Q as the emergency replacement. Clock-sweep in 8.1 for lower lock contention under concurrent access.

RocksDB uses sharded LRU with a high-priority pool that keeps index and filter blocks from being evicted by data block scans. Same principle as 2Q's hot list. They also ship HyperClockCache as a lock-free alternative for high-concurrency reads.

SQLite uses plain LRU. For an embedded database where the working set usually fits in memory and concurrent writers aren't fighting over the buffer pool, that's enough.

The pattern is the same everywhere. Simple workloads got simple policies. Mixed read-write workloads with scans drove the scan-resistant variants. Lock contention under concurrent access pushed PostgreSQL and RocksDB toward clock. Nobody ships the policy with the best hit rate. Production picks the policy that solves the biggest problem the system actually faces, and that problem is almost never "we need 3 more percentage points of cache hits."

Where this goes

No policy dominates.

The point of building all six behind a single trait is that you don't have to predict your workload. You characterize it, swap the policy, and let the numbers decide. The buffer pool is validated. Next is the storage engine.

Code and benchmarks: InterchangeDB. Run cargo bench --bench eviction_policies -- summary to reproduce.


References

  • Bélády, L. A. (1966). A study of replacement algorithms for a virtual-storage computer. IBM Systems Journal.
  • Corbató, F. J. (1968). A Paging Experiment with the Multics System.
  • O'Neil, O'Neil, Weikum (1993). The LRU-K Page Replacement Algorithm For Database Disk Buffering. SIGMOD.
  • Johnson, Shasha (1994). 2Q: A Low Overhead High Performance Buffer Management Replacement Algorithm. VLDB.
  • Megiddo, Modha (2003). ARC: A Self-Tuning, Low Overhead Replacement Cache. FAST.
  • Stonebraker (1981). Operating System Support for Database Management.

Top comments (0)