Go for AI Agents: Why the Language Choice Matters at Production Scale

#go #agents #devops #ai

Go for AI Agents: Why the Language Choice Matters at Production Scale

This week Google open-sourced adk-go — a Go port of their Agent Development Kit. The Hacker News thread that followed racked up ~150 points and surfaced something worth examining: a growing, serious argument that Go is better suited than Python for production AI agent infrastructure.

I want to lay out that argument honestly — including the real tradeoffs — because the default answer ("just use Python, that's where LangChain is") is becoming less automatic than it was a year ago.

Why Python Works, and Where It Starts to Crack

Python's dominance in AI agent development is not an accident. The LLM SDKs (OpenAI, Anthropic, Google) all ship Python-first. LangChain, LangGraph, CrewAI, AutoGen — the entire agent framework ecosystem grew up in Python. The iteration speed is genuinely fast. The community is enormous.

But production agents expose a specific failure mode that Python's dynamic typing doesn't handle well.

When an agent calls a tool, it passes arguments. Those arguments have expected types. An integer for a max_results parameter. A string for a query. A boolean for a include_archived flag. In a Python agent, if something upstream — a bad LLM output, a schema change, a version mismatch — causes the wrong type to land in that call, you find out at runtime.

For a web service, that's annoying but manageable. The request fails, you log it, you fix it.

For an agent running a multi-step workflow that started four hours ago, it's a different situation. You have partial state. Tools have already been called. Side effects may have already happened. Replaying the workflow from scratch means re-burning tokens and re-triggering all the real-world actions your agent took along the way.

The failure mode isn't just a bug. It's a debug session in a system with memory and history.

What Go's Type System Actually Does Here

Consider a simple tool definition in Python:

# Python agent tool — type errors discovered at runtime
def search_web(query, max_results=10):
    """Search the web and return results."""
    # What if max_results arrives as "10" (string) from a malformed LLM output?
    # What if query is None because an upstream tool returned null?
    # You find out here, mid-workflow, potentially hours in.
    return perform_search(query, max_results)

# Registration with a framework
tools = [
    {
        "name": "search_web",
        "description": "Search the web for current information",
        "parameters": {
            "type": "object",
            "properties": {
                "query": {"type": "string"},
                "max_results": {"type": "integer"}
            }
        },
        "function": search_web
    }
]

The schema says max_results is an integer. But nothing enforces that the Python function receives an integer. If the LLM generates "max_results": "10" instead of "max_results": 10, the mismatch lives in runtime land.

Now the Go equivalent using the adk-go pattern:

// Go agent tool — type mismatches caught at compile time
type SearchInput struct {
    Query      string `json:"query"`
    MaxResults int    `json:"max_results"`
}

type SearchOutput struct {
    Results []string `json:"results"`
    Count   int      `json:"count"`
}

var searchTool = adk.NewTool(
    adk.WithName("search_web"),
    adk.WithDescription("Search the web for current information"),
    adk.WithHandler(func(ctx context.Context, input SearchInput) (SearchOutput, error) {
        // input.Query is guaranteed to be a string at compile time
        // input.MaxResults is guaranteed to be an int at compile time
        // If the LLM output doesn't conform, the JSON unmarshaling fails
        // before your handler is ever called
        results, err := performSearch(input.Query, input.MaxResults)
        if err != nil {
            return SearchOutput{}, fmt.Errorf("search failed: %w", err)
        }
        return SearchOutput{Results: results, Count: len(results)}, nil
    }),
)

The difference: SearchInput is a typed struct. The framework deserializes the LLM's JSON output into it. If max_results can't be parsed as an integer, you get a clear, early error — before your tool logic runs, before side effects happen, before you're four steps deeper into a workflow.

This isn't a theoretical benefit. It's the difference between "we caught a schema mismatch on the first test run" and "we caught it at 3am in production."

The Concurrency Story

The second argument for Go is more architectural.

Production agents aren't sequential. A useful agent for DevOps work — say, one that checks Prometheus metrics, queries the Kubernetes API, reads recent Alertmanager alerts, and synthesizes an incident summary — needs to run those tool calls in parallel. Waiting for each one serially adds 3–8 seconds of latency to what could be a 1-second operation.

Goroutines handle this naturally:

// Parallel tool execution using goroutines
func runParallelTools(ctx context.Context, tools []adk.Tool, input any) ([]adk.ToolResult, error) {
    results := make([]adk.ToolResult, len(tools))
    errors := make([]error, len(tools))

    var wg sync.WaitGroup
    for i, tool := range tools {
        wg.Add(1)
        go func(idx int, t adk.Tool) {
            defer wg.Done()
            result, err := t.Execute(ctx, input)
            results[idx] = result
            errors[idx] = err
        }(i, tool)
    }

    wg.Wait()

    // Collect errors
    for _, err := range errors {
        if err != nil {
            return nil, err
        }
    }
    return results, nil
}

This isn't something Python can't do — asyncio exists, concurrent.futures exists. But goroutines are lightweight (~2KB stack), cheap to spawn by the thousands, and the model is built into the language rather than layered on top. For agents that fan out to many tools simultaneously, or orchestrate multiple sub-agents, the concurrency model isn't an afterthought.

There's also the memory footprint. A Python async worker running 20 concurrent agent tasks has a very different resource profile than a Go service doing the same. For teams running agents on Kubernetes without a dedicated GPU budget — which is most of Valerii's ICP — this matters at the infrastructure billing layer.

The Go Agent Ecosystem, March 2026

A year ago, "Go for AI agents" was a theoretical argument. There wasn't much to point at. That has changed.

adk-go (github.com/google/adk-go) — Google's official Go Agent Development Kit. Released this week. Code-first approach with typed tool definitions, evaluation framework, and deployment adapters for Cloud Run and GKE. Still early, but it carries the weight of Google's internal agent work and signals that Go is a supported target for production deployment.

AgenticGoKit (github.com/AgenticGoKit/AgenticGoKit) — Community-built, production-focused. Includes MCP (Model Context Protocol) tool discovery built-in, DAG/parallel/loop orchestration patterns, and OpenTelemetry instrumentation from the start. The OTel integration is particularly important: tracing an agent workflow in production — which tools were called, what the LLM decided, where latency came from — is essential for debugging, and frameworks that bolt observability on later tend to have gaps.

Ingenimax agent-sdk-go and Jetify/ai — Additional community libraries filling out the ecosystem. Not all production-ready, but the ecosystem is building.

None of these match the breadth of LangChain's integrations yet. That's an honest gap. But for teams building bespoke agents with a defined tool set — rather than exploring the full LangChain integration catalog — the Go ecosystem is functional today.

The Real Tradeoffs

This would be dishonest without the other side.

The Python ecosystem is genuinely larger. LLM provider SDKs, evaluation frameworks, vector store integrations, fine-tuning tooling — almost all of it lands in Python first. If you're stitching together third-party tools, you'll hit more friction in Go.

LLM output validation is harder without Pydantic. Python's Pydantic library does structured output validation in a way that nothing in Go matches yet for developer ergonomics. The typed struct approach in Go is cleaner in theory but requires more boilerplate to achieve the same validation expressiveness.

Your team probably knows Python. Switching languages for a component of your stack has a real cost in onboarding, debugging unfamiliarity, and cognitive overhead. For a team of two backend engineers already running Python services, adding a Go agent layer means accepting that cost consciously.

Compile-time safety has limits. Go's type system helps at the tool interface layer, but the LLM's decision-making — which tools to call, in what order, with what intent — is still a runtime artifact you can't type-check. The hard part of agent reliability isn't the argument types. It's the reasoning quality.

What I'd Actually Do Today

If I were starting a new production agent project today:

For a small, well-defined agent with a fixed tool set — infrastructure monitoring, incident triage, internal automation — I'd seriously evaluate Go. The type safety at the tool interface, the goroutine concurrency model, and the single-binary deployment are genuine advantages for this class of problem.

For an exploratory agent, a research tool, or anything that needs to integrate with the LangChain ecosystem broadly — Python is still the faster path. The iteration speed advantage is real when you're not yet sure what the agent's tool set will look like in a month.

The honest position: Python isn't wrong for agents. It's just no longer the only sensible choice. The question is worth asking again for each new project rather than defaulting on autopilot.

A Note on What This Week Signals

Google releasing adk-go isn't just a toolkit release. It's a production signal from a team that runs AI agents internally at a scale that most teams will never approach. They chose to invest in Go tooling. The fact that the HN community had been independently building in that direction — AgenticGoKit, agent-sdk-go, the active debate — suggests this is convergent rather than top-down.

The Python-first era of AI agent development is not over. But it's no longer the only era running.

For teams making infrastructure decisions about their agent layer now, the language question is worth reopening — not as a rewrite, but as a deliberate choice for the next project.

What's your experience with Go for agent infrastructure? Have you hit the Python reliability problems I'm describing, or has your stack avoided them? Curious what patterns people are seeing in production.

LinkedIn