How I build and orchestrate LLM agents in Go

#go #ai #opensource #showdev

TL;DR: I built a Go framework for LLM agents called Galdor. This walks through how I build agents with it: one agent with tools, then a few specialist agents with a router in front. The code is real and runs against any provider.

Most of my agents start as one thing. A model, a few tools, and a loop that lets it call those tools until it has an answer. That covers a surprising amount. The mess shows up later, when that single agent has fifteen tools and a system prompt the length of a short story, and it keeps reaching for the wrong one. So this post is the two stages I keep landing on, and how they look in Galdor.

One agent

An agent here is a provider, a set of tools, and a config. A tool is just a Go function. The input struct is the schema, so you never write JSON schema by hand:

type weatherIn struct {
    City string `json:"city" jsonschema:"City to look up"`
}
type weatherOut struct {
    TempC int    `json:"temp_c"`
    Brief string `json:"brief"`
}

func weather(ctx context.Context, in weatherIn) (weatherOut, error) {
    return weatherOut{TempC: 21, Brief: "sunny in " + in.City}, nil
}

p, _ := anthropic.New(anthropic.Config{APIKey: os.Getenv("ANTHROPIC_API_KEY")})

reg, _ := tool.NewRegistry(
    tool.MustNewTool("weather", "Look up the weather for a city", weather),
    builtins.MustNewMathTool(),
)

cfg := agent.Config{Provider: p, Tools: reg, Model: "claude-haiku-4-5"}
final, _ := agent.Run(ctx, cfg,
    "Weather in Quito, and what's that in Fahrenheit?",
    "Use the tools to answer.")

fmt.Println(final.FinalText)

That runs a ReAct loop: the model calls weather, gets the result back, calls math, then answers. Swapping Anthropic for OpenAI, Gemini, Bedrock, or a local model through Ollama is a different provider on that one line. Nothing else moves.

A few agents with a router

When one agent gets overloaded I split it into specialists and put a routing agent in front. Galdor calls that a supervisor: a small LLM that reads the request, picks a worker, and stops once it has an answer. Each worker is its own agent with its own tools, so they stay small.

The bit that makes this clean is that a worker is just a string in, string out function, and a whole ReAct agent fits behind that signature:

// each specialist is a normal ReAct agent
billing, _   := agent.NewReAct(agent.Config{Provider: p, Tools: billingTools, Model: model})
technical, _ := agent.NewReAct(agent.Config{Provider: p, Tools: techTools, Model: model})

// wrap one so the supervisor sees a single string in, single string out
run := func(r *graph.Runnable[agent.State]) func(context.Context, string) (string, error) {
    return func(ctx context.Context, task string) (string, error) {
        final, err := r.Invoke(ctx, agent.State{
            Messages: []schema.Message{schema.UserMessage(task)},
        })
        return final.FinalText, err
    }
}

supervisor, _ := council.NewSupervisor(council.SupervisorConfig{
    Provider: p,
    Model:    model,
    Workers: []council.Worker{
        {Name: "billing",   Description: "invoices, refunds, charges, subscriptions", Run: run(billing)},
        {Name: "technical", Description: "bugs, outages, login issues, system status", Run: run(technical)},
    },
    MaxHops: 4,
})

final, _ := supervisor.Invoke(ctx, council.SupervisorState{
    Input: "My last invoice charged me twice. Can you check?",
})

fmt.Println(final.Final) // the answer
for _, h := range final.History { // who got called, and with what task
    fmt.Printf("[%s] %s\n", h.Worker, h.Task)
}

The supervisor sends the billing question to the billing agent, and final.History shows exactly who it called and what it asked them. That history is the thing I stare at when a route goes wrong, because the bug is almost always a vague worker description, not the worker itself.

There's also a swarm mode, where the agents hand off to each other directly instead of going through a central router. Same idea, different shape.

Why it composes like this

Both the single agent and the supervisor compile down to the same thing: a graph running over goroutines and channels. So they're the same kind of value, and the graph features apply to either one. You get checkpoint and resume for human-in-the-loop, and every model call and tool call along the way becomes an OpenTelemetry span you can open up later, with none of that wired per agent.

The same thing without writing Go

Not everything needs to be a Go program. A lot of my real workflows are a YAML file and a couple of CLI calls. You give up custom Go tools that way, but you keep the builtin and MCP tools, and every run still records a trace.

A single agent is one agent: block. This one triages an incoming issue, and instead of guessing your project's conventions it reads them: file_read is confined to a base_dir, so the agent pulls your real CONTRIBUTING.md before deciding anything.

# triage.yaml — run: galdor cast triage.yaml "$(cat issue.txt)"
version: 1
agent:
  provider: anthropic
  model: claude-haiku-4-5
  max_iterations: 6
  system: |
    You triage one incoming issue for an open-source Go project.
    First read CONTRIBUTING.md and README.md with file_read, so your
    verdict reflects this project's real scope, not a generic guess.
    Then output: type, in-scope?, 1-3 labels, and a courteous draft reply.
  tools:
    builtins: [file_read]
    base_dir: ./project   # file_read can only touch files in here

To chain a few agents, each stage is its own cast and the shell feeds one's output into the next. A release pipeline I run turns raw commits into bullets, then into an announcement. The fiddly parts are worth knowing:

DB=./pipeline.db

# stage 1: commits -> categorized bullets
raw=$(galdor cast digest.yaml "$(cat commits.txt)" \
        --trace --db "$DB" --run-id digest)

# strip <think> blocks: reasoning models (MiniMax, DeepSeek-R1) emit them
# inline, and that noise would leak into stage 2's input
bullets=$(printf '%s' "$raw" | perl -0pe 's{<think>.*?</think>}{}gs')

# stage 2: bullets -> release announcement
galdor cast announce.yaml "$bullets" \
        --trace --db "$DB" --run-id announce

The answer goes to stdout and the trace logs to stderr, so $(galdor cast ...) captures only the answer. Run the stages in sequence rather than through a | pipe, so they don't write the SQLite trace DB at the same moment. And trace both to the same --db but give each its own --run-id, or the traces collide instead of sitting side by side in galdor scry list. If you'd rather a router decide who runs instead of a fixed order, that's galdor council with several workers in one YAML, supervisor or swarm. (Heads up: council doesn't trace to the dashboard yet, so the chained version is the one where you see every step in scry today.)

Either way, galdor scry show <run-id> and galdor ui replay exactly what each agent read and did, from a local SQLite DB. No collector, no account, nothing hosted.

A bit about the project. Galdor is Go, Apache 2.0, single binary, self-hosted. Providers (Anthropic, OpenAI, anything OpenAI compatible like Ollama or vLLM, Gemini, Bedrock) and memory backends are separate modules so the core stays small. Right now it's just me on it.

YasserCR / galdor

A Go-native framework for LLM agents, with OpenTelemetry observability built in.

galdor

galdor (n., Old English, c. 9th century): incantation, spell, a chanted word that bends reality.

A Go-native framework for building, orchestrating and observing AI agents. Native OpenTelemetry. Embedded dashboard. One binary. No external SaaS. Apache 2.0.

Why galdor

The table below was last verified against each project's repo, releases and official docs in May 2026. Sources are linked under the table; PRs welcome when something drifts.

galdor	LangChain Python + LangSmith	LangChainGo	Eino	Genkit Go
Latest release	v1.0.0 (Jun 2026)	langchain-core v1.4.0 (May 2026)	v0.1.14 (Oct 2025)	v0.8.13 stable, v0.9.0-alpha active (May 2026) — pre-1.0	mcp plugin v1.8.0 GA (May 2026)
Language / runtime	Go	Python	Go	Go	Go
Observability story	OTel-native, with an embedded SQLite trace store + dashboard served from the same binary	LangSmith (closed-source SaaS)	callbacks only, no OTel	callbacks; the shipped tracing target is Langfuse, not OTel	OTel-native; Genkit Monitoring (the hosted dashboard) is Google-Cloud

…

View on GitHub

If you build multi-agent things in Go, I'm curious how you draw the line: one big agent with lots of tools, or specialists with a router in front, and where that choice has bitten you. There are runnable versions of these snippets in the repo.