DEV Community: Saksham Kapoor

I Built a Redis Server in Rust — and Found Where It Breaks

Saksham Kapoor — Tue, 24 Mar 2026 11:56:58 +0000

Most developers use Redis like this:

SET key value
GET key

It feels instant. Effortless.

But once you try building Redis yourself, you realize:
• concurrency is the real problem
• locks kill performance faster than logic
• observability itself can become a bottleneck

So I built RustRedis — a Redis-compatible server in Rust — to understand what actually happens under load.

This wasn’t about features.

It was about answering one question:

Where does performance actually break under concurrency?

🔗 Full Project

Code + benchmarks: Github Link

1. System Design

The server follows a task-per-connection model:

Client → TCP → Tokio Task → Command Execution → Shared DB

Each connection:
• parses RESP protocol
• executes commands
• returns responses

All tasks share a central database.

Two implementations were tested:
• Mutex (global lock)
• DashMap (sharded locks)

This allows direct comparison of locking strategies.

2. The Real Problem: Concurrency

At low load, everything works fine.

At high load, everything changes.

The bottleneck is not:

❌ parsing
❌ networking

It is:

👉 shared state contention

3. Lock Contention (Where It Breaks)

With a global Mutex:
• all writes serialize
• threads queue behind each other
• throughput collapses

At high concurrency:
• p99 latency explodes
• throughput drops significantly

This is called:

👉 lock convoy effect

Even short critical sections become slow under contention.

4. DashMap vs Mutex

Replacing the global lock with DashMap (sharded locking):
• reduces contention
• allows parallel writes
• improves throughput significantly

At high concurrency:
• ~60% higher throughput
• ~40% lower latency

But:

👉 not free

Trade-offs:
• more overhead per operation
• complexity for full-scan operations

5. Observability Became a Bottleneck

This was unexpected.

Tracking metrics per command introduced:

👉 another shared structure

Three approaches were tested:

Global Mutex
• simple
• but severe contention

Sharded Metrics
• better scalability
• reduced lock contention

Thread-Local Batching
• no locks on hot path
• near-zero overhead

Key insight:

Observability can become a primary bottleneck under load.

At high concurrency:
• telemetry alone caused ~30% performance drop

6. Persistence Trade-offs (AOF)

Three persistence modes:

Mode	Behavior
Always	fsync every write
EverySecond	background flush
No	OS-managed.

Results:
• Always → ~80% throughput drop
• EverySecond → minimal overhead
• No → fastest but unsafe

Insight:

👉 durability always costs performance

7. Throughput Scaling

Performance peaks early:
• ~10–100 clients → optimal
• 500+ clients → contention dominates
• 1000 clients → system becomes unstable

Why?

👉 lock contention grows non-linearly

8. Redis vs RustRedis

Compared with Redis-compatible system:

At low concurrency:
• Redis is faster (no locking overhead)

At high concurrency:
• RustRedis shows more stable behavior
• lower tail latency

Reason:

👉 multi-threaded I/O vs single-threaded event loop

9. The Most Important Insight

This project changed how I think about systems:

👉 performance is not about code speed
👉 it’s about contention management

Key takeaways:
• shared state is the real bottleneck
• locks don’t scale linearly
• batching removes contention
• observability must be designed carefully

10. What I’d Improve Next

actor-based architecture (no shared state)
lock-free structures
better persistence batching
distributed sharding

Conclusion

Building a Redis-like system reveals something important:

The hardest part of systems design is not correctness — it’s managing contention under load.

Most systems don’t fail because they are wrong.

They fail because they don’t scale under pressure.

I Built a Key-Value Database from Scratch — Here’s What I Learned

Saksham Kapoor — Thu, 19 Mar 2026 21:15:32 +0000

Modern storage systems are built around a small set of fundamental principles: optimize for sequential I/O, minimize in-place mutation, and design for recoverability from failure.

Systems such as Bitcask, Kafka, and LSM-based databases all embrace these ideas in different forms. To better understand these trade-offs at a systems level, I built KestrelCache — a minimal persistent key-value storage engine implemented in C# and .NET.

This article focuses on the core mechanics of the storage engine, including its architecture, data layout, write/read paths, failure handling, and the practical trade-offs that emerge from a log-structured design.

1. Problem Framing

At its core, a storage engine must answer a simple question:

How do we store and retrieve data efficiently while preserving correctness under failure?

Traditional designs rely on in-place updates, but these introduce complexity:

random disk I/O
fragmentation
complicated concurrency control
difficult crash recovery

KestrelCache instead adopts a log-structured approach, where all writes are appended to disk. This eliminates in-place mutation entirely.

2. System Architecture

The entire system is defined by two components:

In-Memory Index (HashMap<Key, Offset>)
                ↓
Append-Only Log File (Disk)

The responsibilities are clearly separated:

Disk provides durability and write throughput
Memory provides fast lookup

There are no secondary indexes or complex data structures. The append-only log is the source of truth.

3. Write Path

The write path (Put) is designed for simplicity and performance:

Serialize the key and value into bytes
Compute a CRC32 checksum
Append the record to the end of the log file
Update the in-memory index with the new offset

This design guarantees:

strictly sequential disk writes
no in-place updates
inherent crash safety (append-only)

Sequential writes are significantly more efficient than random writes, especially under sustained workloads.

4. Read Path

Reads (Get) avoid scanning the log entirely:

Lookup the key in the in-memory index
Retrieve the corresponding file offset
Seek directly to the location on disk
Read and return the value

This results in:

constant-time lookup in memory
a single disk seek

The performance of reads is therefore predictable and efficient.

5. Delete Semantics (Tombstones)

Deletes are implemented using tombstones, rather than removing data from disk.

Instead of deleting a record:

[previous value]
[tombstone record]

A tombstone is represented by a record where:

ValueSize = -1

The in-memory index removes the key, but the log retains the history.

This approach:

avoids rewriting files
maintains write performance
simplifies concurrency

However, it introduces the need for periodic compaction.

6. On-Disk Record Format

Each record is stored in a compact binary format:

[CRC32][KeySize][ValueSize][Key][Value]

Where:

CRC32 ensures data integrity
KeySize and ValueSize define record boundaries
ValueSize = -1 indicates a tombstone

This format enables:

corruption detection
safe parsing during recovery
validation of partial writes

In storage systems, integrity checks are not optional — they are foundational.

7. Memory Index Design

The in-memory index is a simple:

Dictionary<string, long>

It maps each key to its latest offset in the log file.

This design provides:

O(1) lookup time
minimal overhead
simplicity

However, it introduces a key limitation:

The index must fit entirely in memory.

This makes the system memory-bound as the dataset grows.

8. Startup and Recovery

On startup, the system reconstructs its state by scanning the log file:

Read records sequentially
Validate checksums
Update the index for each key

The final state reflects the most recent record for each key.

Advantages:

simple and reliable recovery
no need for separate metadata

Trade-off:

startup time increases with log size

9. Observability and Logging

Unlike kernel-level systems, user-space storage engines can implement structured logging.

Typical log events include:

write operations with offsets
tombstone creation
recovery progress
checksum failures

Example:

[INFO] Appended key=user:1 at offset=0
[INFO] Tombstone written for key=user:1
[WARN] Checksum mismatch at offset=170

Observability is critical for:

debugging
detecting corruption
understanding system behavior

Without it, diagnosing issues becomes extremely difficult.

10. Failure Scenarios

This design handles several failure modes naturally:

Crash During Write

Partial writes can be detected via checksum mismatch
Invalid records can be skipped during recovery

Data Corruption

CRC32 validation prevents silent data corruption

Power Loss

Since writes are append-only, previously written data remains intact

However, durability depends on the underlying filesystem and flush behavior.

11. Performance Characteristics

Strengths

high write throughput (sequential I/O)
simple and predictable behavior
efficient point lookups
strong crash recovery guarantees

Limitations

unbounded file growth
memory-bound index
no range queries
no transactions or isolation
no concurrent write optimization

These limitations reflect deliberate design simplicity.

12. Compaction Strategy

Over time, the log accumulates:

outdated values
tombstones

Compaction reclaims space:

scan log → keep latest entries → rewrite new file

This reduces:

disk usage
read amplification

Compaction is a core requirement in all log-structured systems.

13. Design Trade-offs

This project highlights several key trade-offs:

Design Choice	Benefit	Cost
Append-only writes	Fast, simple	Requires compaction
In-memory index	Fast reads	Memory usage
Tombstones	Cheap deletes	Disk growth
Simple format	Easy recovery	Limited features

Storage systems are ultimately about choosing the right trade-offs.

14. Lessons Learned

Building a storage engine from scratch reveals several important insights:

Performance is primarily driven by I/O patterns
Sequential access is the most important optimization
Simplicity reduces failure modes
Observability must be designed early
Data integrity must be enforced at the lowest level Most importantly:

Reliable systems are not built by adding features, but by controlling complexity.

Conclusion

KestrelCache is intentionally minimal, but it captures the essence of modern storage engine design.

Even a simple log-structured key-value store demonstrates:

how durability is achieved
how performance is optimized
how systems recover from failure

Understanding these fundamentals is essential for building scalable backend systems and infrastructure.

Building a Minimal Operating System Kernel with Interrupts, Paging, and Device Drivers

Saksham Kapoor — Mon, 16 Mar 2026 18:02:31 +0000

Modern operating systems often feel like black boxes. We interact with processes, files, and network sockets without thinking about what actually happens underneath.

To better understand how low-level systems work, I built a small experimental kernel that runs on x86 architecture, implementing several core subsystems including interrupt handling, paging, device drivers, and a simple shell.

This article explains the architecture of that kernel and the main design decisions behind it.

Why Build a Kernel?

Operating systems are one of the most complex pieces of software ever created. But the fundamental ideas behind them are surprisingly understandable when explored step by step.

The motivation behind this project was to understand:

How CPUs transition from bootloader to kernel
How interrupts allow hardware to communicate with software
How paging enables virtual memory
How device drivers interact with hardware
How kernel services can be exposed through system calls

Rather than studying these concepts in isolation, the goal was to implement them together inside a working kernel.

System Overview

The kernel runs on 32-bit x86 architecture and boots through a Multiboot2-compatible bootloader such as GRUB.

At a high level the architecture looks like this:

Bootloader (GRUB) ↓ Kernel Entry (Assembly) ↓ Interrupt Descriptor Table ↓ Memory Management (Paging) ↓ Hardware Drivers ↓ Kernel Shell

Each subsystem interacts with the others to form a minimal operating system environment.

Boot Process

The boot process begins when the bootloader loads the kernel into memory and transfers control to its entry point.

The kernel performs several early initialization steps:

Initialize descriptor tables
Set up memory paging
Configure interrupt handling
Initialize hardware drivers
Start the kernel shell

This early boot stage is critical because the system is still running without many safety guarantees.

Interrupt Handling

Hardware interrupts are one of the most fundamental mechanisms in operating systems.

Devices such as keyboards, timers, and disks signal events to the CPU using interrupts.

The kernel implements:

Interrupt Descriptor Table (IDT)
Exception handlers
IRQ handlers

The IDT maps interrupt numbers to handler functions, allowing the CPU to transfer control to the correct kernel routine.

Example flow:

Keyboard Key Press ↓ Keyboard Controller Interrupt ↓ CPU triggers interrupt handler ↓ Kernel keyboard driver processes input

Without interrupts, the CPU would have to constantly poll devices, which would waste enormous amounts of processing time.

Memory Management

Memory management is implemented using paging.

Paging allows the operating system to map virtual memory addresses to physical memory addresses.

The kernel initializes:

Page directory
Page tables
Paging control registers

Once paging is enabled, memory can be organized into virtual address spaces.

Benefits of paging include:

memory protection
flexible address mapping
foundation for user processes

In this kernel the paging system establishes a simple flat memory model for kernel execution.

Kernel Heap and Dynamic Memory

Operating systems cannot rely on standard libraries like malloc.

Instead, the kernel implements its own heap allocator.

This allows kernel subsystems to dynamically allocate memory for structures such as:

device buffers
process data
driver state

Although simple, this allocator demonstrates the fundamental idea behind dynamic memory management in kernel space.

Device Drivers

Hardware interaction happens through device drivers.

Two drivers are implemented in the kernel:

VGA Display Driver

The kernel writes directly to VGA memory at:

0xB8000

This allows text to be displayed on screen without relying on BIOS services.

The VGA driver implements:

character output
cursor control
color support

PS/2 Keyboard Driver

The keyboard driver handles hardware interrupts generated by the PS/2 controller.

The driver:

reads scancodes from the keyboard port
translates them into characters
forwards input to the kernel shell

This demonstrates how hardware input flows through the interrupt system into software.

Timer Support

The kernel configures the Programmable Interval Timer (PIT) to generate periodic interrupts.

Timers are essential for:

scheduling
time measurement
system responsiveness

Even in a simple kernel, timer interrupts provide the foundation for multitasking systems.

System Call Interface

A minimal system call interface is implemented to expose kernel services.

System calls act as the boundary between:

user programs ↕ kernel services

Although this kernel does not yet implement full user-mode processes, the syscall interface lays the groundwork for future expansion.

Interactive Kernel Shell

To make the system usable, the kernel includes a small command-line shell.

The shell allows interaction with kernel services and debugging of internal behavior.

Typical commands might include:

memory inspection
system information
device testing

This interface helps verify that the kernel subsystems are functioning correctly.

Limitations

This kernel is intentionally minimal and designed primarily as a learning platform.

Some important limitations include:

single-core execution
no filesystem
no process isolation
no SMP support
no preemptive multitasking

Despite these limitations, the project demonstrates how core operating system components interact in a real execution environment.

What This Project Taught Me

Building even a small kernel provides deep insights into how computers work.

Key lessons from this project include:

hardware interrupts are fundamental to system responsiveness
memory management is the foundation of modern OS design
driver design requires careful interaction with hardware
observability and debugging are essential for low-level development

Most importantly, operating systems are not magic, they are carefully structured layers of software interacting directly with hardware.

Conclusion

Building an operating system kernel is one of the most educational experiences for understanding systems engineering.

This project demonstrates how several essential subsystems, interrupts, memory management, device drivers, and system calls, combine to form the foundation of an operating system.

Future work could include:

implementing user processes
adding a filesystem
supporting multitasking
expanding hardware driver support

Even a small kernel reveals just how fascinating low-level systems development can be.