anh

Posted on Apr 19

Anatomy of an OpenAI-compatible provider in Go

#ai #openai #go #vertexai

GoAI shipped Cloudflare Workers AI and FPT Smart Cloud providers in v0.7.0, then refactored the shared plumbing in v0.7.1. Chat-only providers come in at ~84 lines. The two new ones, with embeddings and unique routing, land at 126 and 132. This post walks through the anatomy and which Go features made it small.

Starting point

OpenAI's Chat Completions and Embeddings shape is a de facto standard. Most inference vendors expose it. In GoAI, 18 of 24 providers speak this wire format. They differ only in URL, auth, and occasional routing. 14 of those share a single factory in internal/openaicompat. The other 4 are openai and vertex with custom routing, plus ollama and vllm which wrap the generic compat provider.

"How much code for a new one?" About 84 lines for a chat-only provider. Most of that is options boilerplate users see in their IDE. Providers with embeddings or custom routing land in the 120s.

The interface

A provider implements two interfaces from provider/:

type LanguageModel interface {
    ModelID() string
    DoGenerate(ctx context.Context, params GenerateParams) (*GenerateResult, error)
    DoStream(ctx context.Context, params GenerateParams) (*StreamResult, error)
}

type EmbeddingModel interface {
    ModelID() string
    DoEmbed(ctx context.Context, values []string, params EmbedParams) (*EmbedResult, error)
    MaxValuesPerCall() int
}

No base class, no registry, no lifecycle. Go's interfaces are satisfied implicitly, so adding a provider doesn't touch any other file.

What's shared

internal/openaicompat owns the wire format and the HTTP plumbing. Two factories do most of the work:

func NewChatModel(cfg ChatModelConfig) provider.LanguageModel
func NewEmbeddingModel(cfg EmbeddingModelConfig) provider.EmbeddingModel

The factory handles request building, streaming, response parsing, token resolution, error dispatch, and the embedding round-trip. Provider packages fill in a config struct and pass it.

internal/ is a Go convention: packages under it are importable only within the same module tree, not by external consumers. That lets 14 providers (plus Ollama and vLLM via the compat wrapper) share the factory without exposing a new public API surface.

Provider packages stay thin and user-facing. The factory owns the plumbing. Two concrete providers show how this works.

Cloudflare

Cloudflare Workers AI is OpenAI-compatible, with one quirk: the URL embeds the account ID.

https://api.cloudflare.com/client/v4/accounts/{account_id}/ai/v1/chat/completions

The provider-specific work is URL construction. Everything else comes from the shared factory.

const defaultAPIBase = "https://api.cloudflare.com/client/v4"

func WithAccountID(id string) Option {
    return func(o *options) { o.accountID = id }
}

// In resolveOptions, after reading env vars CLOUDFLARE_API_TOKEN / CLOUDFLARE_ACCOUNT_ID:
if o.baseURL == "" && o.accountID != "" {
    o.baseURL = fmt.Sprintf("%s/accounts/%s/ai/v1", defaultAPIBase, o.accountID)
}

Usage:

model := cloudflare.Chat("@cf/meta/llama-3.1-8b-instruct",
    cloudflare.WithAccountID("your-account-id"))

Total file: 126 lines including chat, embeddings, and 6 With* options. Cloudflare provider docs.

FPT Smart Cloud

FPT Smart Cloud's AI marketplace has a different quirk: two regions, Global and Japan, each with its own model catalog.

const (
    baseURLGlobal = "https://mkp-api.fptcloud.com/v1"
    baseURLJP     = "https://mkp-api.fptcloud.jp/v1"
)

func WithRegion(region string) Option {
    return func(o *options) { o.region = region }
}

func regionBaseURL(region string) string {
    switch region {
    case "jp":
        return baseURLJP
    default:
        return baseURLGlobal
    }
}

Usage:

model := fptcloud.Chat("Qwen3-32B", fptcloud.WithRegion("jp"))

The JP region hosts Qwen3-32B, Llama-3.3-70B-Instruct, gpt-oss-120b, GLM-4.7, among others. I verified generate and stream against Qwen3-32B. Total file: 132 lines including chat, embeddings, and region routing. FPT Smart Cloud provider docs.

Both providers follow the same shape: resolveOptions reads env vars (CLOUDFLARE_API_TOKEN, FPT_API_KEY, etc.) as fallback, computes the base URL, then Chat() passes a ChatModelConfig to openaicompat.NewChatModel. Only the URL-derivation bit above is unique.

Compile-time interface checks

The factory has this block near the top of internal/openaicompat/factory.go:

var (
    _ provider.LanguageModel  = (*chatModel)(nil)
    _ provider.CapableModel   = (*chatModel)(nil)
    _ provider.EmbeddingModel = (*embeddingModel)(nil)
)

It assigns a nil pointer of each concrete type into the interface variable. Renaming an interface method breaks the build immediately, not silently at runtime.

Idiomatic Go, not a GoAI invention. One check covers all 14 providers that route through the factory.

Testing

Every provider ships a _test.go using net/http/httptest.NewServer or a custom http.RoundTripper to capture outgoing requests:

// Sketch; roundTripperFunc and okResponse are local helpers in the test file.
var gotAuth, gotURL string
tr := roundTripperFunc(func(req *http.Request) (*http.Response, error) {
    gotAuth = req.Header.Get("Authorization")
    gotURL = req.URL.String()
    return okResponse(), nil
})
t.Setenv("CLOUDFLARE_API_TOKEN", "env-tok")
t.Setenv("CLOUDFLARE_ACCOUNT_ID", "env-acc")
m := Chat("m", WithHTTPClient(&http.Client{Transport: tr}))
_, err := m.DoGenerate(t.Context(), params)
// assert gotAuth == "Bearer env-tok", gotURL contains "env-acc"

No mocking library. The test server (or round-tripper) runs the same code path as production. Streaming tests work the same way, just with Server-Sent Events chunks instead of a JSON body.

All 14 OpenAI-compatible providers reach 100% statement coverage. Factory at 99.8%.

Functional options

Every provider exposes the same small set:

WithAPIKey(key string)
WithTokenSource(ts provider.TokenSource)
WithBaseURL(url string)
WithHeaders(h map[string]string)
WithHTTPClient(c *http.Client)

Plus one or two provider-specific ones (WithAccountID, WithRegion). The signature is always func(*options), so adding a knob doesn't change any constructor.

Not novel, Dave Cheney wrote about it in 2014. It's why the 14 providers feel consistent without sharing a base type.

What Go didn't give me

No default arguments. Every option is a separate With* function. The factory's config struct has 12 fields, most are optional. Zero-value defaults work but grow fragile at 20+ fields.
No decorator pattern. Telemetry and retry wrap explicitly via hooks, not annotations. Verbose but clear.
No pattern matching. Response parsing is if/switch on JSON shapes. Rust enums would be cleaner here.

By the numbers

	LOC
Simple provider (deepinfra, groq, mistral, ...)	84
Complex (cloudflare, fptcloud with embeddings)	126-132
14 OpenAI-compat providers, total	~1,324
Shared factory in `internal/openaicompat`	334
Coverage	100% providers, 99.8% factory

Takeaway

The pattern that scales to 14 providers without bloat:

Split the public surface from the plumbing. User-facing names (cloudflare.WithAccountID, env var conventions) live in the provider package. HTTP dispatch, token resolution, error parsing live in internal/openaicompat. Changes to the shared code ripple across 14 providers at once without breaking any public API.
Variations as config, not plugins. Extra body fields, fixed headers, optional auth, account-ID URL building, each is a field on ChatModelConfig or a few lines in resolveOptions. No sub-classing, no registry.
Compile-time checks over documentation. The var _ LanguageModel = (*chatModel)(nil) assertion at the top of the factory guarantees every provider still satisfies the interface. No runtime surprises.

The factory is 334 lines. Each provider is a few dozen lines of declarations on top.

v0.7.1 is live. If an inference provider speaks OpenAI-compatible and isn't in GoAI yet, the Cloudflare and FPT diffs are reasonable templates.

DEV Community