Yussuf Ajao

Posted on May 1

I Built a Tiny Database in Go to Understand How Real Ones Work

#go #database #systems #backend

Most backend tutorials stop at the same point: put some data in a map, wrap it in an HTTP handler, call it a key-value store.

That's not wrong — it's just incomplete. It skips the question that separates backend engineers from storage engineers:

If the process dies mid-write, what do you still have on disk?

I built go-durable-kv to practice that question deliberately — Go standard library only, no external database, no ORM, no shortcuts.

Why This Problem Is Worth Practicing

Every production system that writes data eventually has to answer for a crash. Whether it's a database, a message queue, or a cache with persistence — the difference between "we lost some writes" and "we lost nothing" comes down to decisions made before the crash happened.

Those decisions are what this project is about.

What I Built

Component	What it does
Write-Ahead Log (WAL)	Persists every mutation to disk before updating memory
CRC32 checksums	Detects corruption during replay so recovery stops cleanly
Snapshots	Periodic full-state checkpoints that keep start-up time bounded
WAL Compaction	Resets the log after a snapshot so it doesn't grow forever
Multiple transports	HTTP, TCP, CLI — all backed by the same engine core

The One Invariant That Everything Else Depends On

Before walking through the implementation, this rule needs to be stated plainly:

WAL append happens before the in-memory map is updated. Always.

If you flip that order — update memory first, then write to disk — you create a window where the server can crash after acknowledging a write to the client, but before that write hits the log. The client thinks the data is safe. It isn't.

This ordering is the foundation. Everything else is built on top of it.

The Write Path

For a Set operation:

1. Validate — enforce constraints like maximum value size before touching disk.

2. Append to WAL — write a binary record with this layout:

Field	Size
Op (`Set` or `Delete`)	1 byte
Key length	4 bytes
Value length	4 bytes
Key bytes	variable
Value bytes	variable
CRC32 checksum	4 bytes

The CRC covers everything before it in the record. During recovery, any record that fails this check stops replay at the last clean prefix — no silent corruption, no poisoned state.

3. Apply durability policy — three modes, each a different tradeoff:

SyncAlways — flush + fsync after every write. Safest. Slowest.
SyncPeriodic — background goroutine syncs on a ticker. Middle ground.
SyncNone — relies on flush/close behavior. Fastest. Most data at risk in a crash.

This is the part many engineers treat as a binary ("durable" or "not durable"). It's actually a spectrum, and the right choice depends on what your application can tolerate losing.

4. Update the map — only after the WAL append succeeds.

5. Check compaction threshold — if the WAL has grown past MaxWALSizeBytes, trigger a snapshot and reset.

Delete follows the same path: append a tombstone record (zero-length value), then remove the key from the map.

The Read Path

Get acquires a read lock and returns directly from the in-memory map.

It does not touch disk.

This is intentional. Reads are the hot path. Once state is recovered on startup, they should be as fast as possible — a lock acquisition and a map lookup.

Recovery: How the Engine Restarts

On startup:

1. Load the snapshot — deserialize snapshot.gob into the in-memory map. This gives a fast baseline without replaying the entire log history.

2. Replay the WAL — read wal.log from the beginning, re-applying each record in order.

The tail policy matters here. If the WAL ends cleanly, replay succeeds. If it ends with a partial record or a CRC mismatch — which can happen on a hard crash — replay stops at the last known-good position. The partially-written record is discarded. This is how real storage engines handle torn writes, and it's a meaningful design choice: stop safely rather than panic the process.

Snapshots and Compaction

A snapshot is a full serialization of current state, written to disk using a three-step durability pattern:

1. Serialize state → snapshot.tmp
2. fsync snapshot.tmp
3. Atomically rename snapshot.tmp → snapshot.gob
4. Reset the WAL

The atomic rename is the key. If the process crashes between steps 1 and 3, the old snapshot.gob is still intact. You never read a partial snapshot because a partial snapshot never becomes the canonical file.

After the snapshot is confirmed, the WAL is reset. This bounds how much replay work the engine has to do on the next startup — no matter how long the process has been running.

Windows caveat: You can't truncate an open append file handle on Windows the same way you can on POSIX systems. The compaction step closes and reopens the WAL file rather than truncating in place. OS semantics are part of the implementation, not an afterthought.

Concurrency

A single sync.RWMutex protects the in-memory map and coordinates with the engine lifecycle:

Mutations acquire the write lock — they serialize
Get acquires the read lock — multiple readers proceed concurrently
The background sync goroutine must be stopped before the WAL closes — if it's still running when shutdown happens and tries to acquire a lock that the shutdown path holds, you get a deadlock

The HTTP server uses Server.Shutdown for graceful drain — in-flight requests complete before the engine flushes and closes.

Transports Are Adapters, Not the Engine

The core engine lives in internal/engine. It has no knowledge of HTTP or TCP. Transports are thin wrappers:

HTTP — PUT/GET/DELETE /keys/{key}, /health, /metrics using Go 1.22+ route patterns
TCP — newline-delimited JSON (set, get, delete, ping)
CLI — one TCP request per invocation; useful for scripts and manual testing

This separation is a design discipline, not just an organizational preference. It means the engine can be tested without spinning up a server. It means durability logic can't accidentally bleed into transport logic. And it means you can add a new transport without touching the storage layer.

Observability

A metrics struct tracks operation counters using sync/atomic — lock-free increments from multiple goroutines (handlers, replay, compaction, sync loop). /metrics returns JSON.

Atomics here instead of a mutex because the counters are hot-path and contention on a dedicated metrics lock would be unnecessary overhead for what is essentially fire-and-forget accounting.

Testing

Every durability claim has a test. The test suite covers:

Set/Get/Delete correctness, closed-engine behavior, value size limits
WAL encode/decode and full WAL integration paths
Replay across simulated process restarts (shared temp data directory)
Truncated tail and corrupt tail handling — not just happy-path recovery
Snapshot + compaction + "snapshot then WAL tail" restart scenarios
HTTP transport behavior including /metrics
Benchmarks in engine_bench_test.go for baseline performance numbers

The goal wasn't coverage for its own sake. It was: every claim this system makes about durability has a test that would catch a regression.

How to Run It

# Tests
go test ./...

# HTTP server on :4000
go run ./cmd/server

# TCP server on :5000
go run ./cmd/tcpserver

# CLI
go run ./cmd/cli --addr 127.0.0.1:5000 ping
go run ./cmd/cli --addr 127.0.0.1:5000 set mykey hello
go run ./cmd/cli --addr 127.0.0.1:5000 get mykey

Don't point both servers at the same ./data directory simultaneously — there's no cross-process locking, and a single engine owner per data directory is assumed.

What This Is Not

To be precise about scope:

Not a distributed system — no replication, no consensus
Not tuned for high throughput at scale
No multi-key transactions
SyncNone / SyncPeriodic leave some writes at risk in a hard crash — SyncAlways is the mode to reach for when durability matters more than write speed

What I Took Away

Durability is a policy, not a boolean. fsync costs something real. The right answer depends on what your application can tolerate losing, and that's a product decision as much as a technical one.

Recovery is where the real complexity lives. The write path is straightforward once you have the invariant. Recovery — especially torn writes, corrupt tails, the interaction between snapshot state and WAL replay — is where edge cases accumulate.

Thin transports pay off in testing. When the engine is a pure Go struct with no HTTP knowledge, you can test every failure scenario directly without mocking servers or parsing HTTP responses.

OS behavior is part of the implementation. File handle semantics on Windows, fsync guarantees on different filesystems, test cleanup when files are still open — these aren't platform footnotes. They're part of writing correct systems code.

Repo

👉 github.com/carissaayo/go-durable-kv

More detail in docs/architecture.md and the README.

If you're a senior engineer reading this — I'd be curious how you'd approach the compaction trigger. Size-based thresholds are simple, but time-based or operation-count-based have their own trade-offs. Drop it in the comments.

If you're earlier in your career: building something that has to survive a crash is one of the fastest ways to understand what databases are actually doing. You don't need to build PostgreSQL. You need to build something where the data has to still be there on restart.

DEV Community