Building Sentry: A Distributed Message Broker in Go (From Scratch)
I’m learning systems engineering and system design by doing —
not by watching tutorials, not by drawing diagrams, but by actually building infrastructure.
This blog documents my journey building Sentry, a Kafka-inspired distributed message broker written in Go, from scratch.
Not a wrapper.
Not a framework experiment.
Not a CRUD backend.
A real broker: raw TCP, binary wire protocol, append-only logs, offset indexing, crash recovery, and concurrency at scale.
Why I Started This Project
Most backend engineers (including me, earlier) live behind abstractions:
HTTP frameworks
ORMs
Message queues as black boxes
Managed services that “just work”
But at some point I realized something uncomfortable:
I could use Kafka, Redis, and RabbitMQ…
but I had no real idea how they worked internally.
So I asked myself:
How does a broker actually store messages on disk?
How are offsets tracked?
What happens when the process crashes mid-write?
How do consumers resume safely?
How does a binary protocol work over raw TCP?
What breaks under network lag or partial writes?
And that’s where Sentry was born.
What Is Sentry?
Sentry is a high-performance, Kafka-like distributed message broker written in Go.
It focuses on:
Simplicity over features
Deterministic behavior
Low-level correctness
Failure-first design
Observability via logs
Learning by implementation
The goal is not to replace Kafka.
The goal is to understand Kafka-class systems by building one.
High-Level Architecture
At a high level, Sentry has these core layers:
Network Layer
Raw TCP server
Per-connection goroutines
Binary protocol decoding
Protocol Layer
Custom wire protocol
Length-prefixed frames
Correlation IDs
Versioning support
Broker Core
Message routing
Partition selection
Offset assignment
Persistence Layer
Append-only log segments
Offset → byte index files
Time-based index files
Crash-safe replay
Consumer Layer
Offset-based reads
Replay semantics
Deterministic fetch order
Each layer is explicit.
Nothing is hidden behind magic.
Custom Binary Wire Protocol
One of the first things I built was the wire protocol.
Instead of HTTP or gRPC, Sentry uses a custom binary protocol over raw TCP.
Why?
Because production brokers don’t speak JSON over HTTP.
They speak:
Framed binary messages
Big-endian encoded integers
Compact payloads
Deterministic layouts
Versioned schemas
Protocol Design
Each request frame looks like:
Copy code
| Frame Length (4 bytes) |
| Message Type (2 bytes) |
| Correlation ID (4 bytes) |
| Payload (N bytes) |
Key features:
Length-prefixed framing
So partial TCP reads can be reassembled correctly.
Correlation IDs
So responses match requests in concurrent connections.
Big-endian encoding
For deterministic cross-platform decoding.
Version field (planned)
For forward compatibility.
Concurrency Model
Sentry uses Go’s strengths:
Goroutines
Channels
Mutexes
Worker pools
Ingress Model
One goroutine per TCP connection
Each connection reads frames
Frames are pushed into a bounded worker pool
Workers decode and route requests
This prevents:
Unbounded goroutine growth
Memory pressure
Head-of-line blocking
Backpressure
If the worker pool queue is full:
New requests block
Producers slow down
The system protects itself
No silent overload.
No hidden memory leaks.
Persistence: Append-Only Log Segments
This is the heart of the system.
Every topic partition in Sentry is backed by:
A .log file → message bytes
A .index file → offset → byte position
A .timeindex file → timestamp → offset
Why Append-Only?
Because append-only logs give you:
Crash safety
Sequential disk writes
High throughput
Simple replay
Deterministic ordering
Segment Structure
Each segment is defined by:
Copy code
Go
type Segment struct {
baseOffset uint64
log *os.File
index *os.File
timeIndex *os.File
topic string
partition uint32
active bool
}
Writing a Message
The write path:
Append message bytes to .log
Record (relativeOffset, bytePosition) in .index
Record (timestamp, relativeOffset) in .timeindex
fsync the log file
Return offset to producer
This ensures:
Durability
Deterministic offsets
Replay safety
Offset Indexing
To avoid scanning entire log files:
Each message write updates an index entry:
Copy code
| Relative Offset (4 bytes) |
| Byte Position (4 bytes) |
Stored in .index.
This allows:
O(1) offset → file seek
Fast consumer reads
Predictable latency
Time-Based Indexing
Each message also updates a .timeindex file:
Copy code
| Timestamp (8 bytes) |
| Relative Offset (4 bytes) |
This enables future features like:
Fetch by timestamp
Log compaction
Retention policies
Crash Recovery
This is where systems engineering actually begins.
When the broker restarts:
It scans the partition directory
Discovers all segment files
Loads .index files
Replays .log files if needed
Rebuilds in-memory offsets
Marks last segment as active
This ensures:
No data loss
No duplicate offsets
Safe resume for consumers
Failure-First Design
Everything in Sentry is built assuming failure:
Disk write failures
Partial TCP reads
Process crashes
Power loss
Client disconnects
Examples:
Writes check active segment state
Index lookups validate bounds
Reads handle EOF explicitly
Recovery paths are first-class
Consumer Fetch Logic
Consumers fetch messages using:
Topic
Partition
Offset
Max batch size
The broker:
Looks up offset in .index
Seeks to byte position
Reads message bytes
Returns batch to consumer
Advances consumer offset
This allows:
Replay from any offset
Exactly-once semantics (client-side)
Crash-safe resumes
Observability via Logs
Every major action logs explicitly:
Copy code
[INFO] Sentry Broker Starting...
[INFO] Data directory ready: ./data
[INFO] Loading log segments from disk...
[INFO] Loaded 3 log segments from disk
[INFO] Recovered offsets up to 10432
[INFO] Listening for producer connections...
Why this matters:
Debuggability
Replay verification
Crash analysis
Trust in the system
What This Project Taught Me
- Abstractions Hide Complexity Before Sentry, “Kafka stores messages” was a black box. Now I know: How bytes hit disk How offsets are assigned How indexes are structured How recovery actually works
- Concurrency Is Not Free Goroutines are cheap — but not infinite. Without: Worker pools Backpressure Bounded queues Your system will explode under load.
- Failure Is the Default If you don’t explicitly design for: crashes restarts partial writes corrupt state your system is already broken.
- Performance Is a Trade-Off Every choice has consequences: fsync = durability vs throughput memory buffers = speed vs safety batching = latency vs efficiency There is no “best” choice. Only conscious trade-offs. What’s Next for Sentry This is only the beginning. Planned improvements: Consumer groups Replication across nodes Leader election Segment compaction Retention policies Snapshotting Metrics & dashboards Raft-based metadata layer Why I’m Building This in Public Because: It keeps me accountable I get feedback from real engineers It forces me to document decisions It creates a learning trail others can follow Final Thoughts Building Sentry taught me something critical: Building features is easy. Designing systems that don’t break is the real challenge. This project took me far beyond CRUD apps and APIs. It forced me to think in: bytes offsets failure modes recovery paths concurrency limits durability guarantees And honestly? This has been the most educational project I’ve ever built. Repo & Resources GitHub: https://github.com/tejas2428cse990-svg/Sentry.git Blog series (coming soon): 👉 Dev.to link Feedback Welcome
Top comments (0)