DEV Community

Cover image for I built a Go microservices framework in 2017. Here's what 8 years of production taught it.
Aafaq Zahid
Aafaq Zahid

Posted on • Originally published at substack.com

I built a Go microservices framework in 2017. Here's what 8 years of production taught it.

In 2017 I was maintaining a Node.js ecommerce server when a new project landed on my desk — an IoT platform for vehicle tracking devices. Thousands of devices, transmitting data continuously, with clients expecting real-time visibility into what the devices were reporting.

The Node.js server started showing its limits almost immediately.

Too many devices, too much real-time data, not enough throughput. Clients were seeing delays. Data was arriving late. The storage layer was struggling so badly the team was deleting the previous month's data every 30 days just to keep things running. And underneath all of it was a monolith — one large server doing everything, with no clean way to scale the parts that were struggling without scaling everything.

We needed microservices. We needed speed. We needed something built for this.


Why Go

I came to Go from C++ — not from Python or Ruby or JavaScript. I had spent years in structured, compiled languages where you thought carefully about types, memory, and concurrency before you wrote a line. Node.js had been a detour. Java had been a year. Go felt like coming home.

Go was young in 2017. The community was smaller. The ecosystem was thinner. But Google was behind it, the community was growing fast, and the performance characteristics were exactly what the IoT platform needed — compiled, concurrent, fast.

The decision was straightforward. The harder question was: how do you structure a Go microservices system from scratch?


The go-micro problem

The obvious answer was go-micro — the most complete Go microservices framework available at the time. I tried it. It worked. But it felt like too much.

go-micro has service discovery, plugins, a full RPC system, and abstractions for everything. For a team that needed to move fast on a real-time IoT system, it felt like learning a new language on top of a new language. The cognitive overhead was real.

What I actually needed was simpler. Some services needed to process messages from a queue. Some needed to respond without a queue at all — pure fire and forget. Some needed HTTP. Some needed raw socket connections for the devices. The system needed to be modular but not complicated.

go-micro could do all of this. But I kept feeling like I was fighting the framework instead of building the product.

So I stopped fighting it and started building my own.


The first version

The first version of Keel was almost embarrassingly simple. One main file. One config file. A way to define multiple services and run them inside a single Docker container.

The core idea: every service registers itself, the framework boots the right one based on a flag, and each service gets access to whatever it needs — HTTP, sockets, database, messaging — through a shared configuration and lifecycle system.

one binary
one config
multiple services
boot the one you need
Enter fullscreen mode Exit fullscreen mode

That was it. No magic. No service discovery. No plugins. Just a clean way to structure a Go microservices project so you didn't have to rebuild the foundation every time you added a service.

The first services I built with it handled socket connections for the IoT devices. Raw TCP and Socket.IO, managed through the same lifecycle system, sharing the same configuration and database connections.

It worked. So I kept building on top of it.


Kafka to NATS

The first messaging system I wired into Keel was Kafka. It made sense at the time — Kafka was the standard for high-volume event streaming and the IoT platform was generating significant data volumes.

But Kafka was heavy. The operational overhead was real. And then I found NATS — a Go-native messaging system, lightweight, fast, and surprisingly simple to adopt.

The switch was straightforward. NATS gave me everything I needed — Pub/Sub for fire-and-forget event distribution, RPC for request/reply between services — without the operational weight of Kafka. I've used NATS in every project since.


Eight years of production problems, codified

The first version of Keel solved one problem. Every project after that added another layer.

The ecommerce project needed pagination — three different kinds of it, as it turned out: standard pagination, SQL pagination, and aggregate pagination for complex MongoDB queries. So I built all three into Keel.

A healthcare project needed FHIR compliance and soft deletes — records that were logically removed but never physically deleted from the database. So I built soft delete into the database layer, routing deleted records to _deleted collections automatically with deletedAt and deletedBy fields.

A social platform needed real-time features — Socket.IO rooms, broadcast to users, broadcast to rooms, custom events. So I built the Socket.IO server into Keel's lifecycle system alongside HTTP.

A fintech project needed reliability guarantees — circuit breakers, health checks, readiness probes. So those went in too.

Every production problem that appeared in one project became a solved problem in the next one. Eight years of this, across multiple companies and industries, and Keel became something I no longer thought of as a framework. It was just how I built Go services.


What Keel is today

Register a service. Implement two methods. Boot.

func init() {
    servicehandler.Register("myservice", func() servicehandler.ServiceBase {
        return &MyService{}
    })
}

func (s *MyService) Run() error {
    // your service logic here
    return nil
}

func (s *MyService) Stop() {
    // graceful shutdown
}
Enter fullscreen mode Exit fullscreen mode

That's three methods. Everything else is configuration.

From that starting point, your service has access to:

HTTP routing through Gin with middleware tiers. Auto-generated OpenAPI/Swagger documentation. Built-in /health and version endpoints. API validation with human-readable error messages.

Three database drivers — MongoDB, MySQL, PostgreSQL — with a rich operation set including soft delete, index management, migrations, and three flavors of pagination. Redis caching and distributed locking. Meilisearch full-text search.

NATS messaging — Pub/Sub, RPC, topic registry, batched event publishing. Socket.IO with rooms, broadcasts, and custom events.

Circuit breakers. Health checks with readiness/liveness/startup probes. OpenTelemetry distributed tracing. Structured logging with zap. Structured errors with trace IDs.

FCM push notifications. Email via gomail. WhatsApp notifications. All behind a common interface with concurrency limits.

Panic recovery on every goroutine. Graceful shutdown on SIGINT/SIGTERM. Multi-service single binary with no shared global state.

A new service, from zero, in under five minutes. Independently verified by Claude Sonnet in Cursor.


Why I'm open sourcing it now

Keel has lived inside private repositories for eight years. Every company I've worked with has used it. Every system I've built has run on it. It has handled millions of daily requests across IoT platforms, fintech systems, healthcare infrastructure, and social platforms.

I kept it private not out of secrecy but out of inertia. It was an internal tool. It worked. There was always something more urgent to build.

But an internal tool that solves real problems is more useful as a public one. If you're building Go microservices and you're tired of wiring the same fifteen things from scratch every time — Keel exists because I was tired of that too.

👉 github.com/glodb/keel

👉 keel-code — a complete working example


Aafaq Zahid is a software architect and founding engineer who has been building distributed systems and production infrastructure for 17 years. Creator of Keel and DBFusion. Previously wrote about designing a custom UDP protocol for 18,000 IoT devices.

LinkedIn: linkedin.com/in/aafaqzahid

Top comments (0)