rock288

Posted on May 17

After 5 years of Go services, here's the boilerplate I wish existed

#go #opensource #microservices #observability

TL;DR: I open-sourced rock288/go-mongo-boilerplate — a Go 1.25 service template that ships the boring production stuff (observability, retry, DLQ, SSRF-safe HTTP, health splits, graceful shutdown) so you don't write it for the 7th time. Click "Use this template" and start with make scaffold name=Order.

The problem with every Go boilerplate I've used

I've started six Go services in the last five years. The first 200 lines were always the same:

Wire up a slog handler that auto-injects trace_id
Pick a config library (koanf? viper? envconfig?) and write a Validate()
Add Mongo + an event.CommandMonitor that doesn't leak memory
Build a Kafka consumer with retry topic + DLQ
Hand-roll graceful shutdown for two consumer goroutines
Realize the HTTP client is wide-open to SSRF and patch the dialer
Split /healthz (liveness) from /readyz (readiness)
Spend an afternoon making golangci-lint, mockery, and Wire all happy together

Most Go boilerplates on GitHub solve one of these. None of the ones I found solved all of them in a way that felt opinionated rather than kitchen-sink-y.

So I wrote one. Then I wrote it again. Then I extracted the third iteration into a public template.

This post is a tour of the parts that took me the most rewrites to get right — and the trade-offs I'd argue with you about.

1. Package-by-feature + a tiny `platform/` layer

The most common boilerplate mistake: models/, controllers/, services/ directories. That layout scales to maybe four features before you grep for User and get 14 files in 7 packages.

internal/
  user/                 ← one feature, all in one place
    model.go
    repository.go
    service.go
    handler.go
    dto.go
    routes.go
    errors.go
    wire.go
  role/                 ← another feature
  platform/             ← cross-feature infrastructure
    config/  logger/  database/  httpserver/  httpclient/
    kafka/   sqs/      health/   observability/  resilience/
    pagination/         httperror/

The hard rule: features call each other only through exported interfaces (UserService, not userService). Cross-feature dependencies on concrete types are a code smell. Lint rule on this would be nice — open to suggestions.

The internal/platform/ layer is everything that doesn't belong to a single feature. Any feature may import platform; features may not import each other's internal symbols.

Adding a new feature is one command:

make scaffold name=Order

A bash script clones internal/user/, runs perl to rename identifiers, and prints the wire-up checklist. From make scaffold to a working GET /orders endpoint is ~10 minutes.

2. MongoDB v2 driver — and why you probably haven't migrated yet

The go.mongodb.org/mongo-driver/v2 GA'd, but most templates on GitHub are still on v1. The v2 driver has a cleaner cursor API, generics support, and — crucially for production — there's still no official OTel instrumentation package.

So I wrote a custom event.CommandMonitor:

// internal/platform/database/mongo_otel.go
client, err := mongo.Connect(options.Client().
    ApplyURI(uri).
    SetMonitor(NewCommandMonitor(tracer)))

The monitor keeps a RequestID → span map populated on CommandStarted and consumed on CommandSucceeded / CommandFailed. The catch: if a command never completes (network drop, server restart) the entry leaks. So there's a janitor goroutine that purges entries older than 2 minutes.

// Sketch — full impl in the repo
type CommandMonitor struct {
    inflight sync.Map  // RequestID → spanEntry
}

func (m *CommandMonitor) Started(ctx, evt) {
    _, span := tracer.Start(ctx, "mongo."+evt.CommandName)
    m.inflight.Store(evt.RequestID, spanEntry{span, time.Now()})
}

func (m *CommandMonitor) Succeeded(ctx, evt) {
    if v, ok := m.inflight.LoadAndDelete(evt.RequestID); ok {
        v.span.End()
    }
}

Hot take: every project I've seen that uses MongoDB in Go has at least one production memory leak in their tracing layer. Mine probably did too until I wrote that janitor.

3. Kafka retry + DLQ that actually works under load

This is the part of the template I've rewritten the most times across jobs.

Rules I now live by:

Three topics per logical event: events, events.retry, events.dlq. No magic, no in-memory queues.
Retry topic has its own consumer group (<group>-retry). When it polls a message, it sleeps min(base * 2^n, max) before republishing to the main topic. This separates retry latency from main-topic throughput.
x-retry-count and x-error-reason are headers, not body fields. The body is opaque to the platform.
Inbound x-retry-count and x-error-reason are stripped on republish. Attacker can't pre-set retry count to force DLQ. (Spotted on a real pentest. Yes, it was me.)
Idempotency key auto-generated on publish if not provided. Consumers dedupe via this key, not message offset.

// The contract the platform exposes
type MessageHandler interface {
    Handle(ctx context.Context, record *kgo.Record) error
}

// Return:
//   nil                                  → commit offset (ACK)
//   errors.Is(err, context.Canceled)     → no commit, no republish (shutdown)
//   errors.Is(err, kafka.ErrNonRetryable) → straight to DLQ
//   any other error                      → retry topic with incremented count

W3C traceparent is injected on the producer side and extracted on the consumer side so distributed traces don't break when crossing the broker.

producer.Publish(ctx, "user-events", key, body)
//                  ↑
//                  span context auto-propagated into the record header

The failure path calls ForceSample() so retry/DLQ traffic always shows up on the dashboard — sampling shouldn't hide errors.

4. SQS as a second broker — same API, same routing

Plenty of teams pick SQS over Kafka when they're AWS-only and don't need replay history. So the template ships both, side-by-side, with deliberately identical routing semantics:

// Kafka
type kafka.MessageHandler interface { Handle(ctx, *kgo.Record) error }

// SQS — same shape
type sqs.MessageHandler interface { Handle(ctx, *types.Message) error }

// Same return contract:
//   nil → ACK
//   ctx.Canceled → drop
//   ErrNonRetryable → DLQ
//   other err → retry queue with backoff

The same make scaffold walkthrough works for both. The cmd/worker binary runs four goroutines via errgroup.WithContext (Kafka main+retry, SQS main+retry); first non-nil cancels the shared ctx and all consumers shut down together.

Gotchas the docs warn you about:

SQS MessageDelaySeconds caps at 900s. Backoff > 15 min silently clamps. (For longer backoffs you need an external scheduler.)
MessageAttributes limit is 10 keys; 3 are reserved (traceparent, tracestate, x-idempotency-key). Producer rejects with ErrTooManyAttributes if the caller exceeds the budget.
Standard SQS does NOT guarantee ordering — don't expect Kafka-partition semantics.

LocalStack for dev:

make localstack-up
make sqs-create-queues   # provisions events / events-retry / events-dlq + RedrivePolicy

Production: queues are provisioned by your IaC (Terraform/CDK). The worker only needs sqs:Send/Receive/Delete/GetQueueUrl/ChangeMessageVisibility — no :CreateQueue. Boilerplate ships infra code only; you wire your own sqs.MessageHandler.

5. Compile-time DI with `google/wire` — once you try it you can't go back

Runtime DI containers feel clever until you spend two hours debugging "nil pointer" because container key resolution failed at startup.

google/wire generates the wiring at compile time. If a provider is missing, the build fails. There's no reflection, no runtime overhead, and the generated file is grep-able.

//go:build wireinject

func InitializeServer(cfgPath string) (*ServerApp, func(), error) {
    wire.Build(
        config.Load,
        ProvideMongoConfig,
        database.NewClient,
        user.ProviderSet,           // one line wires the whole feature
        role.ProviderSet,
        ProvideRouter,
        wire.Struct(new(ServerApp), "*"),
    )
    return nil, nil, nil
}

make wire regenerates wire_gen.go. CI fails on drift — so a contributor adding a feature without running make wire gets blocked.

The trade-off: there's a learning curve. But after the first feature, every subsequent addition follows the same ProviderSet pattern and Claude Code (or any agent) can copy it.

Which brings me to —

6. The vibe-coding angle: `CLAUDE.md` + `AGENTS.md`

If you've used Claude Code, Cursor, Aider, or Windsurf agent mode, you know they all read the repo for context. Most repos give them nothing useful — agent ends up cargo-culting whatever pattern it finds first.

I committed two files specifically for AI agents:

CLAUDE.md (215 lines) — commands, architecture, coding conventions, package boundaries, "Adding a feature" checklist
AGENTS.md — multi-agent compat redirect (Cursor and Aider both read this convention)

## Coding Conventions

**Naming**
- Interfaces: no `I` prefix. `UserService` (interface) / `userService` (impl)
- Constructors `NewUserService` return the interface
- Errors: sentinel `ErrXxx` in feature-local `errors.go`
...

## Adding an SQS-consuming feature

1. Create `internal/<feature>/event_handler_sqs.go` implementing `sqs.MessageHandler`
2. Replace `ProvideSQSHandler` in `cmd/worker/providers.go`
3. Add `<feature>.NewSQSEventHandler` to `cmd/worker/wire.go` ProviderSet
4. `make wire mocks test`

The payoff is real: I can paste a Linear ticket into Claude Code, and it produces a 80%-correct PR because it knows the repo's conventions, the test patterns, and which file to edit. The agent doesn't waste context discovering — the discovery is one file.

This is a controversial design choice. Some people don't want their repo "locked into" Claude. But CLAUDE.md is just markdown — re-readable by humans, ignorable by anyone who doesn't use AI. Cost: 215 lines. Benefit: agents move from "best-guess" to "follows your conventions".

7. The boring stuff you wish someone had written for you

Things in the template you'd otherwise reinvent on day 4 of your project:

/healthz (liveness) vs /readyz (readiness) with cached checkers and fail-after-N hysteresis so a transient blip doesn't flap your load balancer.
SSRF-safe HTTP client (httpclient.NewExternalClient) — custom Dialer rejects loopback, RFC1918, link-local, and AWS metadata IPs (169.254.169.254). The default for any non-hardcoded URL.
http.Server timeouts wired correctly (ReadHeaderTimeout, ReadTimeout, WriteTimeout, IdleTimeout) — these CANNOT be replaced by handler-level timeouts because body read happens before the handler.
Middleware ordering documented with rationale:

  otelgin → Recovery → RequestID → RequestLogger → RateLimit → BodyLimit → Timeout → CORS → handler

Cursor pagination as a platform-level helper — opaque base64(JSON) cursor, generic Page[T] envelope, ClampLimit capped at 100.
Shared error envelope {error: {code, message}} via httperror.BadRequest(c, err) so handlers don't redefine gin.H{"error": err.Error()} 47 times.
X-Request-ID propagated through ctx, response headers, and slog fields — debug correlation from client log → server trace.
Distroless Docker with healthcheck via the binary's healthcheck subcommand (because distroless has no curl/wget).
make ci runs lint + test + race + govulncheck + gitleaks. The CI workflow blocks merges on any of these.

None of these are revolutionary. Together they save you the first two sprints.

What's NOT in the template (and why)

I'm allergic to "kitchen sink" boilerplates. Things deliberately excluded:

Auth / AuthZ — every team has different requirements (session vs JWT, OPA vs Casbin vs hand-rolled). Adding any choice is a wrong choice for 80% of users.
Integration tests with testcontainers — go-kit philosophy: build the integration harness when you have a real feature that needs it, not on speculation.
CQRS / event sourcing — demand-driven. If your domain needs it, you know.
API versioning gymnastics (v1/, v2/) — the boilerplate has /api/v1 baked in; multi-version is the user's call.
GraphQL — different concern.
Frontend — wrong repo.

If you need any of these, the template is a starting point, not a finished product.

How to use it

# 1. Click "Use this template" on GitHub, or:
git clone https://github.com/rock288/go-mongo-boilerplate my-service
cd my-service

# 2. Rename the module path (everywhere it appears)
NEW=github.com/<your-org>/<your-repo>
OLD=github.com/rock288/go-mongo-boilerplate
go mod edit -module=$NEW
grep -rl "$OLD" --include='*.go' --include='*.yml' --include='*.yaml' --include='Makefile' --include='*.md' . \
  | xargs perl -pi -e "s|$OLD|$NEW|g"
make wire && make mocks

# 3. Start your first feature
make scaffold name=Order
# → internal/order/ ready; edit model.go, dto.go, wire it up

# 4. Run
docker-compose up -d
make migrate-up
make run                 # http://localhost:8002

make ci should be green. If it isn't, that's a bug — open an issue.

What I'd like feedback on

Is internal/middleware/ cleanly separated, or should it move under internal/platform/httpserver/middleware/?
The SQS app-level retry queue costs an extra SendMessage per failure compared to native SQS redrive. I chose it for Kafka-symmetry. Worth the cost?
MessageDelaySeconds 900s cap forces clamping. For backoffs >15min I think the right answer is "use Step Functions" — but it's not built in. Should it be?
CLAUDE.md + AGENTS.md — pro or con for a public template? I've heard both.

Open an issue if you have opinions or — better — a PR.

DEV Community

After 5 years of Go services, here's the boilerplate I wish existed

The problem with every Go boilerplate I've used

1. Package-by-feature + a tiny `platform/` layer

2. MongoDB v2 driver — and why you probably haven't migrated yet

3. Kafka retry + DLQ that actually works under load

4. SQS as a second broker — same API, same routing

5. Compile-time DI with `google/wire` — once you try it you can't go back

6. The vibe-coding angle: `CLAUDE.md` + `AGENTS.md`

7. The boring stuff you wish someone had written for you

What's NOT in the template (and why)

How to use it

What I'd like feedback on

Links

Top comments (0)

The problem with every Go boilerplate I've used

1. Package-by-feature + a tiny platform/ layer

2. MongoDB v2 driver — and why you probably haven't migrated yet

3. Kafka retry + DLQ that actually works under load

4. SQS as a second broker — same API, same routing

5. Compile-time DI with google/wire — once you try it you can't go back

6. The vibe-coding angle: CLAUDE.md + AGENTS.md

7. The boring stuff you wish someone had written for you

What's NOT in the template (and why)

How to use it

What I'd like feedback on

Links

1. Package-by-feature + a tiny `platform/` layer

5. Compile-time DI with `google/wire` — once you try it you can't go back

6. The vibe-coding angle: `CLAUDE.md` + `AGENTS.md`