Strand

Posted on Mar 31 • Originally published at strandnerd.com

Building a Firecracker VM Orchestrator in Go - Part 2: API Server

#go #opensource #ai #virtualmachine

Introduction

If you missed the first post in this series, start here. It covers the foundation: five provider interfaces that decouple the Flames control plane from any specific infrastructure backend.

This time we're building the first thing that actually does something: the API server. This is the first real consumer of those provider interfaces. A running binary. go run ./cmd/flames-api, curl against it, create a VM. That kind of thing.

But what made this spec interesting wasn't the HTTP handlers, it was the design conversation I had with Claude Code along the way. I pushed back on several decisions, changed the architecture in meaningful ways, and the final result is quite different from what was initially proposed. That's the part worth documenting.

The Steering

Here's the thing about working with AI agents on architecture: the first draft is usually reasonable but generic. The value I bring is knowing where generic breaks down. Let me walk through the key moments where I steered the design.

"Make transport an interface"

The initial spec described an HTTP API server. Handlers, routes, JSON. Standard stuff. But I've been down this road. You build everything into HTTP handlers, then six months later someone asks for gRPC, and you're refactoring the entire service layer.

So I pushed back: the transport needs to be a swappable layer. The business logic should live in a Service struct that speaks only domain types. HTTP is just the first adapter. gRPC, WebSocket, whatever, they're all just different ways to call the same methods.

This changed the entire structure. Instead of one package with handlers that embed business logic, we got three clean layers:

api/: Pure business logic. No net/http, no JSON tags, no transport concepts.
transport/httpapi/: Thin HTTP adapter. Decodes requests, calls service, encodes responses.
cmd/flames-api/: Wires it all together.

"Why use the colon style?"

The initial design proposed POST /v1/vms/{vm_id}:stop, Google's API design convention with colons for custom actions. I asked a simple question: if it's unusual and requires careful ServeMux registration, why not just use /stop?

There was no good reason. We changed it to POST /v1/vms/{vm_id}/stop. Sometimes the right architectural decision is just removing complexity.

"Why not add limit to ListControllers now?"

The design had a risk section noting "No pagination on ListControllers, acceptable for now, will need it later." I pointed out that ListEvents already had filtering via EventFilter, so why defer the same pattern for controllers?

That led to adding ControllerFilter (with Status and Limit fields), which meant extending the StateStore interface from SPEC-001 with a new ListControllers method. A small change now that avoids a breaking change later.

"In-memory for tests, SQLite for dev, Postgres for prod"

The idempotency store risk mentioned "when persistence is added." I corrected the framing. The project isn't going from "in-memory" to "persistent." It's a three-tier model: in-memory for tests, SQLite for local dev, Postgres (or other databases) for prod. Every stateful component should follow this progression.

This is the kind of context that lives in my head but needs to be explicit in the spec so the AI (and future contributors) make the right calls.

The Spec

Like SPEC-001, I shared the main ideas and core points (what the API should do, which endpoints, what the transport story should look like) and Claude Opus 4.6 turned that into three structured spec notes. I steer, it writes:

Requirements: 8 user stories, 23 functional requirements split across service layer, HTTP transport, and infrastructure. Plus non-functional requirements and 13 acceptance criteria.
Design: Package layout, dependency graph, service method signatures, HTTP route mapping, error mapping strategy, idempotency design.
Tasks: 10 tasks across 4 phases with dependency tracking.

A few highlights from the spec:

User Stories: The transport-independence story was the one I added after steering the design:

US-1: As an operator, I want to create a VM via an API...
US-7: As a developer, I want to start a fully functional API server with go run ./cmd/flames-api...
US-8: As a platform team, I want to swap the transport layer without rewriting business logic...

Functional Requirements: Split into three sections reflecting the layered architecture:

Section	Count	Scope
Service Layer	FR-001 to FR-012	Business logic, event emission, limit enforcement
HTTP Transport	FR-013 to FR-020	Routes, error mapping, idempotency, healthz
Infrastructure	FR-021 to FR-023	Entry point, flags, testability

Non-Functional: The stdlib-only constraint got nuanced: the service layer and HTTP transport must be stdlib-only, but future transports (gRPC) may bring their own dependencies scoped to their package.

The Architecture

The service is a concrete struct, not an interface. This was a deliberate call in the design. There's only one business logic, you don't need polymorphism for "the thing that does the work." Transports call its methods directly. If we ever need a service interface (middleware chaining, for example), we extract it then. Not before.

The API

All mutations return 202 Accepted because the desired state is recorded, but convergence happens asynchronously. This is fundamental to how Flames works: you tell the system what you want, and the reconciler makes it happen.

Route Map

Method	Path	Service Method	Status
`POST`	`/v1/vms`	`CreateVM`	202
`GET`	`/v1/vms/{vm_id}`	`GetVM`	200
`POST`	`/v1/vms/{vm_id}/stop`	`StopVM`	202
`DELETE`	`/v1/vms/{vm_id}`	`DeleteVM`	202
`POST`	`/v1/controllers`	`RegisterController`	201
`POST`	`/v1/controllers/{id}/heartbeat`	`Heartbeat`	200
`GET`	`/v1/controllers`	`ListControllers`	200
`GET`	`/v1/events`	`ListEvents`	200
`GET`	`/healthz`	-	200

Idempotency

Mutation endpoints support an Idempotency-Key header. Same key + same body replays the original response. Same key + different body returns 409. The implementation uses SHA-256 body hashing with a ResponseWriter wrapper to capture responses before they're sent. It's a transport concern, lives entirely in transport/httpapi/, not in the service layer.

Error Responses

Errors from the provider layer map cleanly to HTTP: ErrNotFound becomes 404, ErrAlreadyExists and ErrConflict become 409, anything else is 500.

Every error response is structured JSON with code, message, resource_type, and resource_id. No string matching on the client side. This is the same providererr pattern from SPEC-001 surfaced through HTTP. The structured errors we designed in the foundation layer pay off immediately.

The Implementation

10 tasks, 4 phases, all executed in a single session. 7 files created, 4 existing files modified. Zero external dependencies. Every test passes with -race. The binary starts, you curl it, VMs get created.

Reflections on the Workflow

This spec was more interesting than SPEC-001 because the steering mattered more. Provider interfaces are fairly standard Go, the AI can nail those with minimal guidance. But an API server involves architectural judgment calls: where does business logic live? What's a transport concern vs. a service concern? Do you build for flexibility now or later?

The answer was different for different things. Transport abstraction? Build it now, the cost is near zero and the payoff is real. Colon-style routes? Don't bother. Pagination on controllers? Add it now, same pattern already exists.

These are 30-second decisions in a conversation, but they compound into a fundamentally different codebase. The AI proposes, I steer, the spec captures the decision, and the implementation follows. That's the loop.

What I like about the Spec-Driven workflow on ContextPin is that these decisions are recorded. The Requirements note shows what we decided. The Design note shows why. If someone reads the spec six months from now, they'll understand not just the code but the reasoning behind it. That's the difference between documentation and a spec.

What's Next

The API server is running but nobody's talking to it yet. The next specs will bring the system to life: controllers that actually boot Firecracker VMs, a scheduler that assigns VMs to controllers, and persistence so state survives a restart. The provider interfaces and service layer are ready for all of it.

This post is part of the Flames Spec-Driven Development series.

This entire project is being built using ContextPin + Claude Code. ContextPin is an ADE (Agentic Development Environment), a GPU-accelerated desktop app designed for AI-assisted development that integrates directly with Claude Code, Codex, Gemini and OpenCode. ContextPin is going open-source soon. If you want to get early access, you can join here.

DEV Community