Most backend tutorials stop at the same point: put some data in a map, wrap it in an HTTP handler, call it a key-value store.
That's not wrong — it's just incomplete. It skips the question that separates backend engineers from storage engineers:
If the process dies mid-write, what do you still have on disk?
I built go-durable-kv to practice that question deliberately — Go standard library only, no external database, no ORM, no shortcuts.
Why This Problem Is Worth Practicing
Every production system that writes data eventually has to answer for a crash. Whether it's a database, a message queue, or a cache with persistence — the difference between "we lost some writes" and "we lost nothing" comes down to decisions made before the crash happened.
Those decisions are what this project is about.
What I Built
| Component | What it does |
|---|---|
| Write-Ahead Log (WAL) | Persists every mutation to disk before updating memory |
| CRC32 checksums | Detects corruption during replay so recovery stops cleanly |
| Snapshots | Periodic full-state checkpoints that keep start-up time bounded |
| WAL Compaction | Resets the log after a snapshot so it doesn't grow forever |
| Multiple transports | HTTP, TCP, CLI — all backed by the same engine core |
The One Invariant That Everything Else Depends On
Before walking through the implementation, this rule needs to be stated plainly:
WAL append happens before the in-memory map is updated. Always.
If you flip that order — update memory first, then write to disk — you create a window where the server can crash after acknowledging a write to the client, but before that write hits the log. The client thinks the data is safe. It isn't.
This ordering is the foundation. Everything else is built on top of it.
The Write Path
For a Set operation:
1. Validate — enforce constraints like maximum value size before touching disk.
2. Append to WAL — write a binary record with this layout:
| Field | Size |
|---|---|
Op (Set or Delete) |
1 byte |
| Key length | 4 bytes |
| Value length | 4 bytes |
| Key bytes | variable |
| Value bytes | variable |
| CRC32 checksum | 4 bytes |
The CRC covers everything before it in the record. During recovery, any record that fails this check stops replay at the last clean prefix — no silent corruption, no poisoned state.
3. Apply durability policy — three modes, each a different tradeoff:
-
SyncAlways—flush+fsyncafter every write. Safest. Slowest. -
SyncPeriodic— background goroutine syncs on a ticker. Middle ground. -
SyncNone— relies on flush/close behavior. Fastest. Most data at risk in a crash.
This is the part many engineers treat as a binary ("durable" or "not durable"). It's actually a spectrum, and the right choice depends on what your application can tolerate losing.
4. Update the map — only after the WAL append succeeds.
5. Check compaction threshold — if the WAL has grown past MaxWALSizeBytes, trigger a snapshot and reset.
Delete follows the same path: append a tombstone record (zero-length value), then remove the key from the map.
The Read Path
Get acquires a read lock and returns directly from the in-memory map.
It does not touch disk.
This is intentional. Reads are the hot path. Once state is recovered on startup, they should be as fast as possible — a lock acquisition and a map lookup.
Recovery: How the Engine Restarts
On startup:
1. Load the snapshot — deserialize snapshot.gob into the in-memory map. This gives a fast baseline without replaying the entire log history.
2. Replay the WAL — read wal.log from the beginning, re-applying each record in order.
The tail policy matters here. If the WAL ends cleanly, replay succeeds. If it ends with a partial record or a CRC mismatch — which can happen on a hard crash — replay stops at the last known-good position. The partially-written record is discarded. This is how real storage engines handle torn writes, and it's a meaningful design choice: stop safely rather than panic the process.
Snapshots and Compaction
A snapshot is a full serialization of current state, written to disk using a three-step durability pattern:
1. Serialize state → snapshot.tmp
2. fsync snapshot.tmp
3. Atomically rename snapshot.tmp → snapshot.gob
4. Reset the WAL
The atomic rename is the key. If the process crashes between steps 1 and 3, the old snapshot.gob is still intact. You never read a partial snapshot because a partial snapshot never becomes the canonical file.
After the snapshot is confirmed, the WAL is reset. This bounds how much replay work the engine has to do on the next startup — no matter how long the process has been running.
Windows caveat: You can't truncate an open append file handle on Windows the same way you can on POSIX systems. The compaction step closes and reopens the WAL file rather than truncating in place. OS semantics are part of the implementation, not an afterthought.
Concurrency
A single sync.RWMutex protects the in-memory map and coordinates with the engine lifecycle:
- Mutations acquire the write lock — they serialize
-
Getacquires the read lock — multiple readers proceed concurrently - The background sync goroutine must be stopped before the WAL closes — if it's still running when shutdown happens and tries to acquire a lock that the shutdown path holds, you get a deadlock
The HTTP server uses Server.Shutdown for graceful drain — in-flight requests complete before the engine flushes and closes.
Transports Are Adapters, Not the Engine
The core engine lives in internal/engine. It has no knowledge of HTTP or TCP. Transports are thin wrappers:
-
HTTP —
PUT/GET/DELETE /keys/{key},/health,/metricsusing Go 1.22+ route patterns -
TCP — newline-delimited JSON (
set,get,delete,ping) - CLI — one TCP request per invocation; useful for scripts and manual testing
This separation is a design discipline, not just an organizational preference. It means the engine can be tested without spinning up a server. It means durability logic can't accidentally bleed into transport logic. And it means you can add a new transport without touching the storage layer.
Observability
A metrics struct tracks operation counters using sync/atomic — lock-free increments from multiple goroutines (handlers, replay, compaction, sync loop). /metrics returns JSON.
Atomics here instead of a mutex because the counters are hot-path and contention on a dedicated metrics lock would be unnecessary overhead for what is essentially fire-and-forget accounting.
Testing
Every durability claim has a test. The test suite covers:
- Set/Get/Delete correctness, closed-engine behavior, value size limits
- WAL encode/decode and full WAL integration paths
- Replay across simulated process restarts (shared temp data directory)
- Truncated tail and corrupt tail handling — not just happy-path recovery
- Snapshot + compaction + "snapshot then WAL tail" restart scenarios
- HTTP transport behavior including
/metrics - Benchmarks in
engine_bench_test.gofor baseline performance numbers
The goal wasn't coverage for its own sake. It was: every claim this system makes about durability has a test that would catch a regression.
How to Run It
# Tests
go test ./...
# HTTP server on :4000
go run ./cmd/server
# TCP server on :5000
go run ./cmd/tcpserver
# CLI
go run ./cmd/cli --addr 127.0.0.1:5000 ping
go run ./cmd/cli --addr 127.0.0.1:5000 set mykey hello
go run ./cmd/cli --addr 127.0.0.1:5000 get mykey
Don't point both servers at the same
./datadirectory simultaneously — there's no cross-process locking, and a single engine owner per data directory is assumed.
What This Is Not
To be precise about scope:
- Not a distributed system — no replication, no consensus
- Not tuned for high throughput at scale
- No multi-key transactions
-
SyncNone/SyncPeriodicleave some writes at risk in a hard crash —SyncAlwaysis the mode to reach for when durability matters more than write speed
What I Took Away
Durability is a policy, not a boolean. fsync costs something real. The right answer depends on what your application can tolerate losing, and that's a product decision as much as a technical one.
Recovery is where the real complexity lives. The write path is straightforward once you have the invariant. Recovery — especially torn writes, corrupt tails, the interaction between snapshot state and WAL replay — is where edge cases accumulate.
Thin transports pay off in testing. When the engine is a pure Go struct with no HTTP knowledge, you can test every failure scenario directly without mocking servers or parsing HTTP responses.
OS behavior is part of the implementation. File handle semantics on Windows, fsync guarantees on different filesystems, test cleanup when files are still open — these aren't platform footnotes. They're part of writing correct systems code.
Repo
👉 github.com/carissaayo/go-durable-kv
More detail in docs/architecture.md and the README.
If you're a senior engineer reading this — I'd be curious how you'd approach the compaction trigger. Size-based thresholds are simple, but time-based or operation-count-based have their own trade-offs. Drop it in the comments.
If you're earlier in your career: building something that has to survive a crash is one of the fastest ways to understand what databases are actually doing. You don't need to build PostgreSQL. You need to build something where the data has to still be there on restart.
Top comments (0)