imduchuyyy 🐬

Posted on Jan 6

I created a LRU caching server in Rust

#rust #redis #lru

I recently coded a mini caching server in Rust, implementing the LRU (Least Recently Used) mechanism in under 1000 lines of code. By applying zero-copy techniques and careful memory management, I achieved high performance and production-ready stability.

Here is the breakdown of how I built it and why it's so fast.

The Core Design

The heart of the cache is the Cache struct. Instead of using standard pointer-based linked lists (which can be unsafe or slow in Rust due to RefCell overhead), I used a vector-based arena allocator approach.

Key Data Structures

Zero-Copy with bytes::Bytes:
I used bytes::Bytes for both keys and values. This is a crucial optimization. Bytes acts as a reference-counted slice of memory (similar to Arc<[u8]>). When we clone a Key or Value, we aren't copying the underlying data; we are just incrementing a reference count. This makes passing data around extremely cheap.
Index-based Linked List:
The LRU queue is implemented as a double-linked list, but nodes are stored in a Vec<Entry>. The prev and next pointers are simply u32 indices into this vector.
- Memory Efficiency: No memory fragmentation from allocating individual nodes.
- Cache Locality: The Vec keeps entries close in memory, improving CPU cache hits.

type Key = Bytes;
type Value = Bytes;
type Index = u32;

#[derive(Debug)]
struct Entry {
    key: Key,
    value: Value,
    prev: Option<Index>,
    next: Option<Index>,
}

#[derive(Debug)]
pub struct Cache {
    capacity: usize,
    map: HashMap<Key, Index>, // Maps Key -> Index in the entries vector
    entries: Vec<Entry>,      // The "Arena"
    head: Option<Index>,      // MRU (Most Recently Used)
    tail: Option<Index>,      // LRU (Least Recently Used)
    free: Vec<Index>,         // Reusable slots
}

How It Works

The `get` Operation: O(1)

When retrieving an item, we look up the index in the HashMap. If found, we move the node to the head of the list (marking it as most recently used) and return a cheap clone of the value.

pub fn get(&mut self, key: &Key) -> Option<Value> {
    let idx = *self.map.get(key)?;
    self.move_to_head(idx);
    Some(self.entries[idx as usize].value.clone()) // Zero-copy clone!
}

The `put` Operation: O(1)

Inserting a value handles three cases:

Update: If the key exists, update value and move to head.
Insert (Full): If capacity is reached, evict the tail (LRU), reuse its slot, and insert the new item at the head.
Insert (Space Available): Allocate a new slot (or reuse a freed one) and insert at the head.

The eviction logic is particularly clean because we can directly reuse the index from the evicted tail for the new entry, avoiding reallocation.

pub fn put(&mut self, key: Key, value: Value) {
    if let Some(&idx) = self.map.get(&key) {
        self.entries[idx as usize].value = value;
        self.move_to_head(idx);
        return;
    }

    if self.map.len() == self.capacity {
        self.evict();
    }

    let idx = self.alloc(Entry {
        key: key.clone(),
        value,
        prev: None,
        next: None,
    });

    self.map.insert(key, idx);
    self.attach_head(idx);
}

Benchmarks

I ran a benchmark on my local machine to test the raw throughput of the cache implementation (excluding network overhead). The results are impressive:

Environment: cargo run --release, MacBook Pro (Apple Silicon).
Parameters: 500k Capacity, 1M Operations.

Operation	Throughput	Latency (Total)
PUT (Insert)	~5.38 Million ops/sec	185ms (for 1M items)
GET (Hit)	~12.09 Million ops/sec	8.27ms (for 100k items)
GET (Miss)	~6.98 Million ops/sec	14.33ms (for 100k items)

The use of Bytes and the index-based approach yields massive performance benefits, making this simple implementation competitive with production-grade systems.

The Server

Finally, I wrapped this Cache in an HTTP server using axum. To share the cache across threads, I used Arc<Mutex<Cache>>. While the Mutex introduces some contention, the raw speed of the underlying cache keeps the critical section extremely short.

use axum::{
    extract::{Path, State},
    routing::get,
    Router,
    body::Bytes,
};
use std::sync::{Arc, Mutex};
use std::env;

type SharedCache = Arc<Mutex<Cache>>;

#[tokio::main]
async fn main() {
    let capacity: usize = env::var("CAPACITY")
        .unwrap_or_else(|_| "100".to_string())
        .parse()
        .expect("CAPACITY must be a number");

    let port = env::var("PORT").unwrap_or_else(|_| "3000".to_string());
    let addr = format!("127.0.0.1:{}", port);
    let shared_cache = Arc::new(Mutex::new(Cache::new(capacity)));

    let app = Router::new()
        .route("/{key}", 
            get(handle_get).put(handle_put)
        )
        .with_state(shared_cache);

    let listener = tokio::net::TcpListener::bind(&addr).await.unwrap();
    println!("Simple Cache Server running on {}", addr);
    axum::serve(listener, app).await.unwrap();
}

The source code is available on GitHub.

DEV Community

I created a LRU caching server in Rust

The Core Design

Key Data Structures

How It Works

The `get` Operation: O(1)

The `put` Operation: O(1)

Benchmarks

The Server

Top comments (0)

The Core Design

Key Data Structures

How It Works

The get Operation: O(1)

The put Operation: O(1)

Benchmarks

The Server

The `get` Operation: O(1)

The `put` Operation: O(1)