TL;DR: I open-sourced
rock288/go-mongo-boilerplate— a Go 1.25 service template that ships the boring production stuff (observability, retry, DLQ, SSRF-safe HTTP, health splits, graceful shutdown) so you don't write it for the 7th time. Click "Use this template" and start withmake scaffold name=Order.
The problem with every Go boilerplate I've used
I've started six Go services in the last five years. The first 200 lines were always the same:
- Wire up a
sloghandler that auto-injectstrace_id - Pick a config library (koanf? viper? envconfig?) and write a
Validate() - Add Mongo + an
event.CommandMonitorthat doesn't leak memory - Build a Kafka consumer with retry topic + DLQ
- Hand-roll graceful shutdown for two consumer goroutines
- Realize the HTTP client is wide-open to SSRF and patch the dialer
- Split
/healthz(liveness) from/readyz(readiness) - Spend an afternoon making golangci-lint, mockery, and Wire all happy together
Most Go boilerplates on GitHub solve one of these. None of the ones I found solved all of them in a way that felt opinionated rather than kitchen-sink-y.
So I wrote one. Then I wrote it again. Then I extracted the third iteration into a public template.
This post is a tour of the parts that took me the most rewrites to get right — and the trade-offs I'd argue with you about.
1. Package-by-feature + a tiny platform/ layer
The most common boilerplate mistake: models/, controllers/, services/ directories. That layout scales to maybe four features before you grep for User and get 14 files in 7 packages.
internal/
user/ ← one feature, all in one place
model.go
repository.go
service.go
handler.go
dto.go
routes.go
errors.go
wire.go
role/ ← another feature
platform/ ← cross-feature infrastructure
config/ logger/ database/ httpserver/ httpclient/
kafka/ sqs/ health/ observability/ resilience/
pagination/ httperror/
The hard rule: features call each other only through exported interfaces (UserService, not userService). Cross-feature dependencies on concrete types are a code smell. Lint rule on this would be nice — open to suggestions.
The internal/platform/ layer is everything that doesn't belong to a single feature. Any feature may import platform; features may not import each other's internal symbols.
Adding a new feature is one command:
make scaffold name=Order
A bash script clones internal/user/, runs perl to rename identifiers, and prints the wire-up checklist. From make scaffold to a working GET /orders endpoint is ~10 minutes.
2. MongoDB v2 driver — and why you probably haven't migrated yet
The go.mongodb.org/mongo-driver/v2 GA'd, but most templates on GitHub are still on v1. The v2 driver has a cleaner cursor API, generics support, and — crucially for production — there's still no official OTel instrumentation package.
So I wrote a custom event.CommandMonitor:
// internal/platform/database/mongo_otel.go
client, err := mongo.Connect(options.Client().
ApplyURI(uri).
SetMonitor(NewCommandMonitor(tracer)))
The monitor keeps a RequestID → span map populated on CommandStarted and consumed on CommandSucceeded / CommandFailed. The catch: if a command never completes (network drop, server restart) the entry leaks. So there's a janitor goroutine that purges entries older than 2 minutes.
// Sketch — full impl in the repo
type CommandMonitor struct {
inflight sync.Map // RequestID → spanEntry
}
func (m *CommandMonitor) Started(ctx, evt) {
_, span := tracer.Start(ctx, "mongo."+evt.CommandName)
m.inflight.Store(evt.RequestID, spanEntry{span, time.Now()})
}
func (m *CommandMonitor) Succeeded(ctx, evt) {
if v, ok := m.inflight.LoadAndDelete(evt.RequestID); ok {
v.span.End()
}
}
Hot take: every project I've seen that uses MongoDB in Go has at least one production memory leak in their tracing layer. Mine probably did too until I wrote that janitor.
3. Kafka retry + DLQ that actually works under load
This is the part of the template I've rewritten the most times across jobs.
Rules I now live by:
-
Three topics per logical event:
events,events.retry,events.dlq. No magic, no in-memory queues. -
Retry topic has its own consumer group (
<group>-retry). When it polls a message, it sleepsmin(base * 2^n, max)before republishing to the main topic. This separates retry latency from main-topic throughput. -
x-retry-countandx-error-reasonare headers, not body fields. The body is opaque to the platform. -
Inbound
x-retry-countandx-error-reasonare stripped on republish. Attacker can't pre-set retry count to force DLQ. (Spotted on a real pentest. Yes, it was me.) - Idempotency key auto-generated on publish if not provided. Consumers dedupe via this key, not message offset.
// The contract the platform exposes
type MessageHandler interface {
Handle(ctx context.Context, record *kgo.Record) error
}
// Return:
// nil → commit offset (ACK)
// errors.Is(err, context.Canceled) → no commit, no republish (shutdown)
// errors.Is(err, kafka.ErrNonRetryable) → straight to DLQ
// any other error → retry topic with incremented count
W3C traceparent is injected on the producer side and extracted on the consumer side so distributed traces don't break when crossing the broker.
producer.Publish(ctx, "user-events", key, body)
// ↑
// span context auto-propagated into the record header
The failure path calls ForceSample() so retry/DLQ traffic always shows up on the dashboard — sampling shouldn't hide errors.
4. SQS as a second broker — same API, same routing
Plenty of teams pick SQS over Kafka when they're AWS-only and don't need replay history. So the template ships both, side-by-side, with deliberately identical routing semantics:
// Kafka
type kafka.MessageHandler interface { Handle(ctx, *kgo.Record) error }
// SQS — same shape
type sqs.MessageHandler interface { Handle(ctx, *types.Message) error }
// Same return contract:
// nil → ACK
// ctx.Canceled → drop
// ErrNonRetryable → DLQ
// other err → retry queue with backoff
The same make scaffold walkthrough works for both. The cmd/worker binary runs four goroutines via errgroup.WithContext (Kafka main+retry, SQS main+retry); first non-nil cancels the shared ctx and all consumers shut down together.
Gotchas the docs warn you about:
- SQS
MessageDelaySecondscaps at 900s. Backoff > 15 min silently clamps. (For longer backoffs you need an external scheduler.) -
MessageAttributeslimit is 10 keys; 3 are reserved (traceparent,tracestate,x-idempotency-key). Producer rejects withErrTooManyAttributesif the caller exceeds the budget. - Standard SQS does NOT guarantee ordering — don't expect Kafka-partition semantics.
LocalStack for dev:
make localstack-up
make sqs-create-queues # provisions events / events-retry / events-dlq + RedrivePolicy
Production: queues are provisioned by your IaC (Terraform/CDK). The worker only needs sqs:Send/Receive/Delete/GetQueueUrl/ChangeMessageVisibility — no :CreateQueue. Boilerplate ships infra code only; you wire your own sqs.MessageHandler.
5. Compile-time DI with google/wire — once you try it you can't go back
Runtime DI containers feel clever until you spend two hours debugging "nil pointer" because container key resolution failed at startup.
google/wire generates the wiring at compile time. If a provider is missing, the build fails. There's no reflection, no runtime overhead, and the generated file is grep-able.
//go:build wireinject
func InitializeServer(cfgPath string) (*ServerApp, func(), error) {
wire.Build(
config.Load,
ProvideMongoConfig,
database.NewClient,
user.ProviderSet, // one line wires the whole feature
role.ProviderSet,
ProvideRouter,
wire.Struct(new(ServerApp), "*"),
)
return nil, nil, nil
}
make wire regenerates wire_gen.go. CI fails on drift — so a contributor adding a feature without running make wire gets blocked.
The trade-off: there's a learning curve. But after the first feature, every subsequent addition follows the same ProviderSet pattern and Claude Code (or any agent) can copy it.
Which brings me to —
6. The vibe-coding angle: CLAUDE.md + AGENTS.md
If you've used Claude Code, Cursor, Aider, or Windsurf agent mode, you know they all read the repo for context. Most repos give them nothing useful — agent ends up cargo-culting whatever pattern it finds first.
I committed two files specifically for AI agents:
-
CLAUDE.md(215 lines) — commands, architecture, coding conventions, package boundaries, "Adding a feature" checklist -
AGENTS.md— multi-agent compat redirect (Cursor and Aider both read this convention)
## Coding Conventions
**Naming**
- Interfaces: no `I` prefix. `UserService` (interface) / `userService` (impl)
- Constructors `NewUserService` return the interface
- Errors: sentinel `ErrXxx` in feature-local `errors.go`
...
## Adding an SQS-consuming feature
1. Create `internal/<feature>/event_handler_sqs.go` implementing `sqs.MessageHandler`
2. Replace `ProvideSQSHandler` in `cmd/worker/providers.go`
3. Add `<feature>.NewSQSEventHandler` to `cmd/worker/wire.go` ProviderSet
4. `make wire mocks test`
The payoff is real: I can paste a Linear ticket into Claude Code, and it produces a 80%-correct PR because it knows the repo's conventions, the test patterns, and which file to edit. The agent doesn't waste context discovering — the discovery is one file.
This is a controversial design choice. Some people don't want their repo "locked into" Claude. But CLAUDE.md is just markdown — re-readable by humans, ignorable by anyone who doesn't use AI. Cost: 215 lines. Benefit: agents move from "best-guess" to "follows your conventions".
7. The boring stuff you wish someone had written for you
Things in the template you'd otherwise reinvent on day 4 of your project:
-
/healthz(liveness) vs/readyz(readiness) with cached checkers and fail-after-N hysteresis so a transient blip doesn't flap your load balancer. -
SSRF-safe HTTP client (
httpclient.NewExternalClient) — customDialerrejects loopback, RFC1918, link-local, and AWS metadata IPs (169.254.169.254). The default for any non-hardcoded URL. -
http.Servertimeouts wired correctly (ReadHeaderTimeout,ReadTimeout,WriteTimeout,IdleTimeout) — these CANNOT be replaced by handler-level timeouts because body read happens before the handler. - Middleware ordering documented with rationale:
otelgin → Recovery → RequestID → RequestLogger → RateLimit → BodyLimit → Timeout → CORS → handler
-
Cursor pagination as a platform-level helper — opaque base64(JSON) cursor, generic
Page[T]envelope,ClampLimitcapped at 100. -
Shared error envelope
{error: {code, message}}viahttperror.BadRequest(c, err)so handlers don't redefinegin.H{"error": err.Error()}47 times. -
X-Request-IDpropagated throughctx, response headers, andslogfields — debug correlation from client log → server trace. -
Distroless Docker with healthcheck via the binary's
healthchecksubcommand (because distroless has no curl/wget). -
make cirunslint + test + race + govulncheck + gitleaks. The CI workflow blocks merges on any of these.
None of these are revolutionary. Together they save you the first two sprints.
What's NOT in the template (and why)
I'm allergic to "kitchen sink" boilerplates. Things deliberately excluded:
- Auth / AuthZ — every team has different requirements (session vs JWT, OPA vs Casbin vs hand-rolled). Adding any choice is a wrong choice for 80% of users.
- Integration tests with testcontainers — go-kit philosophy: build the integration harness when you have a real feature that needs it, not on speculation.
- CQRS / event sourcing — demand-driven. If your domain needs it, you know.
-
API versioning gymnastics (
v1/,v2/) — the boilerplate has/api/v1baked in; multi-version is the user's call. - GraphQL — different concern.
- Frontend — wrong repo.
If you need any of these, the template is a starting point, not a finished product.
How to use it
# 1. Click "Use this template" on GitHub, or:
git clone https://github.com/rock288/go-mongo-boilerplate my-service
cd my-service
# 2. Rename the module path (everywhere it appears)
NEW=github.com/<your-org>/<your-repo>
OLD=github.com/rock288/go-mongo-boilerplate
go mod edit -module=$NEW
grep -rl "$OLD" --include='*.go' --include='*.yml' --include='*.yaml' --include='Makefile' --include='*.md' . \
| xargs perl -pi -e "s|$OLD|$NEW|g"
make wire && make mocks
# 3. Start your first feature
make scaffold name=Order
# → internal/order/ ready; edit model.go, dto.go, wire it up
# 4. Run
docker-compose up -d
make migrate-up
make run # http://localhost:8002
make ci should be green. If it isn't, that's a bug — open an issue.
What I'd like feedback on
- Is
internal/middleware/cleanly separated, or should it move underinternal/platform/httpserver/middleware/? - The SQS app-level retry queue costs an extra
SendMessageper failure compared to native SQS redrive. I chose it for Kafka-symmetry. Worth the cost? -
MessageDelaySeconds900s cap forces clamping. For backoffs >15min I think the right answer is "use Step Functions" — but it's not built in. Should it be? -
CLAUDE.md+AGENTS.md— pro or con for a public template? I've heard both.
Open an issue if you have opinions or — better — a PR.
Links
- Repo: https://github.com/rock288/go-mongo-boilerplate
- Use this template button: top-right on GitHub
- Architecture doc: docs/architecture.md
- Why no auth? CLAUDE.md — last section
If this saved you an afternoon, leave a star or share with a teammate who's about to start their seventh Go service.
If you spot a bug, an outdated dep, or a convention I should fight you about, open an issue. The repo gets better when people argue.
Top comments (0)