I built a distributed job queue in Go to understand how they actually work

#architecture #distributedsystems #go #showdev

I have used job queues my whole developer life without knowing what was inside them.

So I built one.

Not a wrapper around an existing queue. A full implementation from scratch
with Redis, PostgreSQL, goroutines, and real failure handling.

Here is everything I learned.

Why Dual Storage

Most job queues use one store. Redis is fast. PostgreSQL is durable. I wanted both.

Redis handles dispatch via a sorted set priority queue. Fast enqueue, fast dequeue.

PostgreSQL is the source of truth. Every job lives there permanently.

The rule: no critical state lives only in Redis. If Redis wipes completely,
no job is lost. PostgreSQL has everything.

Three Things Running Concurrently

A worker pool that executes jobs
A scheduler that promotes jobs from PostgreSQL into Redis when their time arrives
A stale reaper that detects crashed workers and requeues their jobs automatically

All three run as goroutines. All three coordinate without stepping on each other.

What Happens When a Worker Crashes

This is the part most tutorials skip.

When a worker picks up a job it marks it as in-progress. If that worker crashes
mid-execution the job stays marked in-progress forever unless something intervenes.

The stale reaper scans for jobs that have been in-progress longer than their timeout.
It requeues them automatically with exponential backoff.

No manual intervention. No lost jobs.

The Numbers

Metric	Result
Job registration	52ns/op, 0 allocations
Job execution	950ns/op

Benchmarked with Go's built-in benchmark tooling.

Ships with a Prometheus metrics endpoint and a pre-built Grafana dashboard
covering queue depth, throughput, and failure rates by job type.

What I Actually Understand Now

Why Redis alone is not enough for a job queue
Why crashed worker recovery needs to be a first class feature not an afterthought
Why exponential backoff matters more than immediate retries

The project is open source with one external contributor already.