DEV Community

Cover image for I created a LRU caching server in Rust
imduchuyyy 🐬
imduchuyyy 🐬

Posted on

I created a LRU caching server in Rust

I recently coded a mini caching server in Rust, implementing the LRU (Least Recently Used) mechanism in under 1000 lines of code. By applying zero-copy techniques and careful memory management, I achieved high performance and production-ready stability.

Here is the breakdown of how I built it and why it's so fast.

The Core Design

The heart of the cache is the Cache struct. Instead of using standard pointer-based linked lists (which can be unsafe or slow in Rust due to RefCell overhead), I used a vector-based arena allocator approach.

Key Data Structures

  1. Zero-Copy with bytes::Bytes:
    I used bytes::Bytes for both keys and values. This is a crucial optimization. Bytes acts as a reference-counted slice of memory (similar to Arc<[u8]>). When we clone a Key or Value, we aren't copying the underlying data; we are just incrementing a reference count. This makes passing data around extremely cheap.

  2. Index-based Linked List:
    The LRU queue is implemented as a double-linked list, but nodes are stored in a Vec<Entry>. The prev and next pointers are simply u32 indices into this vector.

    • Memory Efficiency: No memory fragmentation from allocating individual nodes.
    • Cache Locality: The Vec keeps entries close in memory, improving CPU cache hits.
type Key = Bytes;
type Value = Bytes;
type Index = u32;

#[derive(Debug)]
struct Entry {
    key: Key,
    value: Value,
    prev: Option<Index>,
    next: Option<Index>,
}

#[derive(Debug)]
pub struct Cache {
    capacity: usize,
    map: HashMap<Key, Index>, // Maps Key -> Index in the entries vector
    entries: Vec<Entry>,      // The "Arena"
    head: Option<Index>,      // MRU (Most Recently Used)
    tail: Option<Index>,      // LRU (Least Recently Used)
    free: Vec<Index>,         // Reusable slots
}
Enter fullscreen mode Exit fullscreen mode

How It Works

The get Operation: O(1)

When retrieving an item, we look up the index in the HashMap. If found, we move the node to the head of the list (marking it as most recently used) and return a cheap clone of the value.

pub fn get(&mut self, key: &Key) -> Option<Value> {
    let idx = *self.map.get(key)?;
    self.move_to_head(idx);
    Some(self.entries[idx as usize].value.clone()) // Zero-copy clone!
}
Enter fullscreen mode Exit fullscreen mode

The put Operation: O(1)

Inserting a value handles three cases:

  1. Update: If the key exists, update value and move to head.
  2. Insert (Full): If capacity is reached, evict the tail (LRU), reuse its slot, and insert the new item at the head.
  3. Insert (Space Available): Allocate a new slot (or reuse a freed one) and insert at the head.

The eviction logic is particularly clean because we can directly reuse the index from the evicted tail for the new entry, avoiding reallocation.

pub fn put(&mut self, key: Key, value: Value) {
    if let Some(&idx) = self.map.get(&key) {
        self.entries[idx as usize].value = value;
        self.move_to_head(idx);
        return;
    }

    if self.map.len() == self.capacity {
        self.evict();
    }

    let idx = self.alloc(Entry {
        key: key.clone(),
        value,
        prev: None,
        next: None,
    });

    self.map.insert(key, idx);
    self.attach_head(idx);
}
Enter fullscreen mode Exit fullscreen mode

Benchmarks

I ran a benchmark on my local machine to test the raw throughput of the cache implementation (excluding network overhead). The results are impressive:

Environment: cargo run --release, MacBook Pro (Apple Silicon).
Parameters: 500k Capacity, 1M Operations.

Operation Throughput Latency (Total)
PUT (Insert) ~5.38 Million ops/sec 185ms (for 1M items)
GET (Hit) ~12.09 Million ops/sec 8.27ms (for 100k items)
GET (Miss) ~6.98 Million ops/sec 14.33ms (for 100k items)

The use of Bytes and the index-based approach yields massive performance benefits, making this simple implementation competitive with production-grade systems.

The Server

Finally, I wrapped this Cache in an HTTP server using axum. To share the cache across threads, I used Arc<Mutex<Cache>>. While the Mutex introduces some contention, the raw speed of the underlying cache keeps the critical section extremely short.

use axum::{
    extract::{Path, State},
    routing::get,
    Router,
    body::Bytes,
};
use std::sync::{Arc, Mutex};
use std::env;

type SharedCache = Arc<Mutex<Cache>>;

#[tokio::main]
async fn main() {
    let capacity: usize = env::var("CAPACITY")
        .unwrap_or_else(|_| "100".to_string())
        .parse()
        .expect("CAPACITY must be a number");

    let port = env::var("PORT").unwrap_or_else(|_| "3000".to_string());
    let addr = format!("127.0.0.1:{}", port);
    let shared_cache = Arc::new(Mutex::new(Cache::new(capacity)));

    let app = Router::new()
        .route("/{key}", 
            get(handle_get).put(handle_put)
        )
        .with_state(shared_cache);

    let listener = tokio::net::TcpListener::bind(&addr).await.unwrap();
    println!("Simple Cache Server running on {}", addr);
    axum::serve(listener, app).await.unwrap();
}
Enter fullscreen mode Exit fullscreen mode

The source code is available on GitHub.

Top comments (0)