Gabriel Anhaia

Posted on Apr 18

Why I'd Build My AI Agent in Go, Not Python, in 2026

#ai #go #backend #llm

Book: Observability for LLM Applications — paperback and hardcover on Amazon · Ebook from Apr 22
Also by me: Thinking in Go (2-book series) — Complete Guide to Go Programming + Hexagonal Architecture in Go
My project: Hermes IDE | GitHub — an IDE for developers who ship with Claude Code and other AI coding tools
Me: xgabriel.com | GitHub

I wrote an 18-chapter book on observability for LLM applications this year. The whole time I was writing the instrumentation examples, I kept imagining the agent backend I was instrumenting. The one I would actually reach for, in 2026, to put in front of users. It is not Python.

That sentence would have been wrong in 2023. The Python ecosystem was the only place a serious agent was going to live. LangChain was where the patterns were being invented. The SDKs landed in Python first. Every eval tool shipped a Python client before it shipped anything else. If you wanted to be within a week of the frontier, you were writing Python.

The thing that changed is that agents stopped being research code and started being services. An agent is no longer a notebook that demonstrates chain-of-thought. It is a long-lived process that calls tools, fans out to APIs, holds timeouts, handles cancellation, emits traces, and stays up when one of the providers it depends on goes down. That is a backend service shape. And the language most production backends in the world are already written in, for reasons that have nothing to do with AI, is Go.

Here is why I'd reach for Go for that service in 2026, and where I'd still reach for Python.

1. Goroutines beat asyncio for fan-out tool calls

An agent turn is rarely one LLM call and done. You get a tool-call response, you execute N tools, you feed the results back, you loop. Those N tools want to run in parallel. They are all I/O. A weather API, a search API, an internal Postgres call, a vector DB lookup. Waiting on them in sequence is the easiest way to turn a 400ms turn into a 4-second one.

In Python, you do this with asyncio.gather. It works, until something inside one of your tools is a blocking library, or you forget to await, or you find yourself threading an event loop through code that did not expect one. Async in Python is a color you have to paint every function in your call graph with, and the moment a library you depend on is not painted, you have a problem.

In Go, you do this with go and a sync.WaitGroup (or errgroup.Group). Every function is already async-capable because the runtime multiplexes goroutines onto OS threads for you. There is no colored-function problem. A tool that calls http.Get blocks the goroutine, not the process. You get ten thousand concurrent tool calls for the price of ten thousand small stack allocations. This is the thing Go was literally built to do.

2. Static typing catches JSON-shape bugs before they eat reliability

Tool calling is a contract negotiated in JSON. The model produces JSON that claims to match your schema. Your code parses that JSON and passes it to a function. Most agent bugs I have seen in the wild live at that seam.

In Python, you validate with Pydantic, which is genuinely good, but it is a runtime check. The bug finds you when a production user triggers the model to hallucinate a field name. Your stack trace is a ValidationError at 2am.

In Go, the tool definition, the handler signature, and the parsed arguments are the same type. If you change the schema and forget to update the handler, the program does not compile. You find the bug on your machine, before the agent ever talks to the model. This sounds like a small thing. In a codebase with twenty tools that evolves over six months, it is not a small thing.

3. Binaries deploy. `langchain==0.3.9` does not.

This is the one that will be most familiar to anyone who has run a Python agent in production for more than a quarter.

You pin langchain==0.1.0. Four months later, the security team wants a CVE fix that only lives in langchain==0.3.9. You upgrade. Something in langchain-core renamed. Something in langchain-community moved. Your AgentExecutor subclass no longer subclasses the thing it used to. Your tests pass because you mocked the framework. Production does not, because the runtime behavior of the new tool-calling path subtly differs from the old one on exactly the edge case your biggest customer hits.

A Go binary is a single file. The dependency resolution happened at compile time, on the machine that built it. What runs in production is the same thing you tested, because there is no "the thing you tested" and "the runtime environment it installs itself into." Those are the same object. When you redeploy, you are shipping a binary, not an environment.

You can get close to this in Python with Docker, pinned wheels, and uv. You cannot get all the way there, because the shape of the Python packaging world still assumes the dependency graph is resolved at pip install time. Every production Python shop eventually builds the same Rube Goldberg machine to work around it. Go skips the machine.

4. Memory per running agent is an order of magnitude smaller

This matters less when you have ten concurrent agent sessions on one box. It matters a lot when you have ten thousand.

A Python process that has imported the OpenAI SDK, LangChain, Pydantic, httpx, and two vector DB clients sits somewhere north of 150MB of resident memory before it handles its first request. Every worker you spin up pays that tax again. Gunicorn with 8 workers is already 1.2GB of RAM you spent on imports.

A compiled Go binary that does the same job starts in the 15-25MB range. Each goroutine is a few KB. You can hold tens of thousands of in-flight agent sessions per machine without the memory profile looking scary. This shifts what your infrastructure looks like. Fewer, denser boxes. Less autoscaling churn. The cost curve at scale bends differently.

I am not claiming this matters for your weekend project. I am claiming it matters for the service you are going to run for the next three years.

5. OpenTelemetry GenAI semconv is first-class in Go now

In 2024 this would have been the counter-argument. The OTel GenAI semantic conventions were Python-first. The instrumentation libraries shipped in Python weeks or months before equivalents existed anywhere else.

That gap closed. go.opentelemetry.io/contrib/instrumentation/github.com/openai/openai-go and the sibling packages wrap the official SDKs and emit spans that match the gen_ai.* attribute set that Langfuse, Arize, and Phoenix already know how to read. Tool calls, token counts, model names, prompt and completion events, it is all there, shaped correctly.

If you had told me in 2023 I'd be writing this paragraph, I would not have believed you. The observability story for a Go agent in 2026 is actually cleaner than the Python one, because the Go instrumentation landed late and learned from the mess the Python one went through first.

A Go agent that actually runs

Here is the whole thing. One file, context cancellation, one tool, one LLM turn, OTel trace spans around both. This compiles on Go 1.22+ against the official openai-go SDK.

// agent.go
package main

import (
    "context"
    "encoding/json"
    "fmt"
    "time"

    "github.com/openai/openai-go"
    "github.com/openai/openai-go/option"
    "go.opentelemetry.io/otel"
    "go.opentelemetry.io/otel/attribute"
)

var tracer = otel.Tracer("agent")

func getWeather(ctx context.Context, city string) string {
    _, span := tracer.Start(ctx, "tool.get_weather")
    defer span.End()
    span.SetAttributes(attribute.String("city", city))
    return map[string]string{"Lisbon": "18C clear", "Berlin": "7C rain"}[city]
}

func runTurn(ctx context.Context, client *openai.Client, question string) (string, error) {
    ctx, span := tracer.Start(ctx, "agent.turn")
    defer span.End()

    tools := []openai.ChatCompletionToolParam{{
        Function: openai.FunctionDefinitionParam{
            Name:        "get_weather",
            Description: openai.String("Current weather for a city."),
            Parameters: openai.FunctionParameters{
                "type": "object",
                "properties": map[string]any{
                    "city": map[string]string{"type": "string"},
                },
                "required": []string{"city"},
            },
        },
    }}

    resp, err := client.Chat.Completions.New(ctx, openai.ChatCompletionNewParams{
        Model:    openai.F(openai.ChatModelGPT4oMini),
        Messages: openai.F([]openai.ChatCompletionMessageParamUnion{
            openai.UserMessage(question),
        }),
        Tools: openai.F(tools),
    })
    if err != nil {
        return "", err
    }

    call := resp.Choices[0].Message.ToolCalls
    if len(call) == 0 {
        return resp.Choices[0].Message.Content, nil
    }
    var args struct{ City string }
    _ = json.Unmarshal([]byte(call[0].Function.Arguments), &args)
    return getWeather(ctx, args.City), nil
}

func main() {
    ctx, cancel := context.WithTimeout(context.Background(), 15*time.Second)
    defer cancel()

    client := openai.NewClient(option.WithAPIKey("..."))
    out, err := runTurn(ctx, client, "weather in Lisbon?")
    if err != nil {
        panic(err)
    }
    fmt.Println(out)
}

Three things to notice.

The context.Context is threaded through everything. It carries the timeout, the cancellation signal, and the trace. If the user disconnects, if the turn takes too long, every downstream call learns about it through the same argument. You do not have to remember to check a flag anywhere.

The tool handler and the tool schema live next to each other and use the same types. When you add a second tool next month, the compiler will tell you if you forgot to wire anything up.

The OTel spans wrap the turn and the tool call. If you point this at Langfuse or Phoenix, you get a trace that shows the LLM call, the tool dispatch, the tool latency, and the token counts, already shaped to the GenAI semantic conventions. No custom span attributes.

This is 70 lines. A real agent adds the tool-result feedback loop, multiple tools, and retry logic. It does not add anything structurally different from what you see here.

Where I'd still reach for Python

Be honest about the counter. Python is still deeper where the frontier is moving fastest below the agent layer.

If you are training or fine-tuning a reranker, Python. If you are prototyping a novel retrieval strategy with a research paper's reference implementation, Python. If you need SentenceTransformers, ColBERT, or whatever the current best open-source reranking model is, the first-class binding is in Python. Go has wrappers for most of them, but you are one abstraction removed from the community the models live in.

The split I would actually ship is the boring one. A Python service behind the agent, doing the embedding and reranking work, exposed over gRPC or HTTP. A Go service in front, running the agent loop, calling out to Python when it needs to embed or rerank, calling out to the model provider, fanning out tools, emitting traces. Each language where it is strong. The Python service does not need to be in the hot path of every user request, which is where Go's latency and memory profile matter most.

What this reframes

The question is not "which language is best for AI." Python will keep winning on the parts of the stack that look like notebooks and research code, because that is where the field is still being invented. The question is which language is best for the agent service, the thing that has to run 24/7 in front of your users.

Once you frame it that way, the answer stops being interesting. It is Go for the same reasons it was Go for the payment service, the authentication service, and the feed service. Strong concurrency, a compiler that catches schema drift, a deployable single binary, and a memory profile that lets you pack the fleet tight.

AI does not change any of that. It makes it matter more, because the failure modes at the LLM seam are subtler than the failure modes at the HTTP seam, and you want every tool the backend discipline has already built for you.

If this was useful

The book this post is orbiting is Observability for LLM Applications. Chapter 3 is the one on tracing a Go agent with OpenTelemetry end to end, chapter 16 is cost and token accounting with the same traces, and chapter 18 is the production-readiness checklist the skeleton above is a small piece of.

The Go side is covered in the Thinking in Go pair. The Complete Guide to Go Programming is the language foundation; Hexagonal Architecture in Go is the service shape I would use for the agent.

Book: Observability for LLM Applications — paperback and hardcover · Ebook from Apr 22
Thinking in Go (2-book series): Complete Guide to Go Programming · Hexagonal Architecture in Go
Hermes IDE: hermes-ide.com · GitHub — an IDE for developers who ship with Claude Code and other AI coding tools
Me: xgabriel.com · GitHub

DEV Community

Why I'd Build My AI Agent in Go, Not Python, in 2026

1. Goroutines beat asyncio for fan-out tool calls

2. Static typing catches JSON-shape bugs before they eat reliability

3. Binaries deploy. `langchain==0.3.9` does not.

4. Memory per running agent is an order of magnitude smaller

5. OpenTelemetry GenAI semconv is first-class in Go now

A Go agent that actually runs

Where I'd still reach for Python

What this reframes

If this was useful

Top comments (0)

1. Goroutines beat asyncio for fan-out tool calls

2. Static typing catches JSON-shape bugs before they eat reliability

3. Binaries deploy. langchain==0.3.9 does not.

4. Memory per running agent is an order of magnitude smaller

5. OpenTelemetry GenAI semconv is first-class in Go now

A Go agent that actually runs

Where I'd still reach for Python

What this reframes

If this was useful

3. Binaries deploy. `langchain==0.3.9` does not.