Jason Huang

Posted on May 18

Building an AI Agent in Go: What I Learned

#ai #llm #productivity #discuss

Hey DEV community! 👋

I'm an undergraduate developer who recently shipped OpenAgent — a local AI Agent that runs as a single binary. No dependencies, no Docker, just download and double-click.

This post isn't about marketing. It's about the technical decisions, mistakes, and lessons from building an AI Agent in a language most people don't associate with AI.

Project: github.com/the-open-agent/openagent

Why Go for an AI Agent?

The obvious question: "Isn't AI Python's territory?"

Here's what I realized: AI Agents spend 90% of their time waiting on LLM APIs, not crunching numbers. The bottleneck isn't language performance — it's architecture. How you orchestrate tools, manage state, handle concurrency, and ship to users.

Go excels at exactly those things:

Static compilation → single distributable binary
Goroutines → handle concurrent tool calls without memory explosion
Cross-platform → Windows/Mac/Linux from one codebase
Zero runtime dependencies → users don't install anything

For a "download and run" experience, these trade-offs beat Python's ecosystem richness.

Lesson 1: Embedding Frontend into a Go Binary

I wanted a web UI without shipping separate static files. Go 1.16+ makes this trivial with the embed package:

import "embed"

//go:embed all:dist
var distFS embed.FS

func main() {
    // Serve React build directly from binary
    http.Handle("/", http.FileServer(http.FS(distFS)))
    log.Fatal(http.ListenAndServe(":14000", nil))
}

What I learned: The all: prefix is crucial. Without it, files starting with _ or . get skipped, and your React build might mysteriously break.

Build process:

# Build React app
cd frontend && npm run build

# Go embeds the dist folder automatically
cd .. && go build -o openagent .

One file. Frontend and backend. No path resolution bugs, no "where did my assets go" issues.

Lesson 2: Shell Access Without Footguns

Giving an AI Agent shell access is powerful but dangerous. I spent significant time on safety boundaries in tool/shell.go:

type ShellConfig struct {
    DefaultTimeout time.Duration // 30s default
    MaxTimeout     time.Duration // 300s hard limit
    EnablePTY      bool          // Interactive mode
    AuditLog       bool          // Log every command
}

type ShellSession struct {
    ID      string
    State   SessionState // idle | running | waiting_input
    Timeout time.Time
}

Key design decisions:

Session-based flow — poll/write/submit pattern instead of streaming stdout
Timeouts at two levels — default prevents runaways, max prevents abuse
Optional PTY — interactive programs work, but only when explicitly enabled
Audit logging — every command logged with timestamp and output hash

What I learned: Users will accidentally run rm -rf / or fork bombs. Design for the worst case, not the happy path.

Lesson 3: Memory-Conscious Concurrency

I ran a stress test: 80 concurrent health checks. Memory grew by 10 MB.

Here's why Go's model works for Agents:

// Each tool call gets its own goroutine
func (a *Agent) ExecuteTool(ctx context.Context, tool Tool) (Result, error) {
    ctx, cancel := context.WithTimeout(ctx, tool.Timeout)
    defer cancel()

    resultChan := make(chan Result, 1)

    go func() {
        resultChan <- tool.Execute(ctx)
    }()

    select {
    case result := <-resultChan:
        return result, nil
    case <-ctx.Done():
        return Result{}, ctx.Err()
    }
}

What I learned:

Goroutines are cheap (~2KB stack), but channels need buffering to prevent leaks
context.Context is your friend for cancellation and timeouts
Always defer cancel() — goroutine leaks are subtle and painful to debug

Compare to Node.js: each concurrent operation holds the entire event loop. Memory grows with concurrency. Go's model scales horizontally.

Lesson 4: Streaming LLM Responses

Users expect typing effects, not wall-of-text responses. Implementing SSE (Server-Sent Events) with Go:

func (s *Server) ChatStream(w http.ResponseWriter, r *http.Request) {
    w.Header().Set("Content-Type", "text/event-stream")
    w.Header().Set("Cache-Control", "no-cache")
    w.Header().Set("Connection", "keep-alive")

    flusher, ok := w.(http.Flusher)
    if !ok {
        http.Error(w, "Streaming not supported", http.StatusInternalServerError)
        return
    }

    stream := s.agent.ChatStream(r.Context(), r.Body)
    for chunk := range stream {
        fmt.Fprintf(w, "data: %s\n\n", chunk)
        flusher.Flush()
    }
}

What I learned:

http.Flusher interface is required — not all ResponseWriters support it
The \n\n after data: is mandatory per SSE spec
Always handle client disconnects — r.Context() cancels when they close the tab

Lesson 5: Tool Calling Architecture

The heart of an Agent is deciding which tool to use and when. I settled on this interface:

type Tool interface {
    Name() string
    Description() string
    Schema() json.RawMessage // JSON Schema for LLM
    Execute(ctx context.Context, input json.RawMessage) (Result, error)
}

type Agent struct {
    tools   map[string]Tool
    llm     LLMClient
    history []Message
}

func (a *Agent) Run(ctx context.Context, userInput string) error {
    // 1. Add user message to history
    a.history = append(a.history, UserMessage(userInput))

    for {
        // 2. Ask LLM what to do
        response, err := a.llm.Complete(ctx, a.history, a.availableTools())
        if err != nil {
            return err
        }

        // 3. If LLM wants to use a tool
        if response.ToolCall != nil {
            result := a.executeTool(ctx, response.ToolCall)
            a.history = append(a.history, ToolMessage(result))
            continue // Loop back for next decision
        }

        // 4. Final answer
        a.history = append(a.history, AssistantMessage(response.Content))
        return nil
    }
}

What I learned:

The loop pattern (LLM decides → tool executes → LLM decides again) is surprisingly robust
Tool schemas must be precise — vague descriptions lead to wrong tool selection
History management is critical — context windows fill up fast

Lesson 6: Error Handling in Distributed Systems

An Agent is essentially a distributed system: LLM API, local tools, browser automation, file I/O. Things fail constantly.

My error taxonomy:

var (
    ErrToolTimeout     = errors.New("tool execution timed out")
    ErrToolNotFound    = errors.New("tool not found")
    ErrLLMRateLimit    = errors.New("LLM rate limited")
    ErrLLMContextLimit = errors.New("context window exceeded")
    ErrUserCancel      = errors.New("user cancelled")
)

func (a *Agent) executeWithRetry(ctx context.Context, tool Tool, input json.RawMessage) (Result, error) {
    for attempt := 0; attempt < 3; attempt++ {
        result, err := tool.Execute(ctx, input)
        if err == nil {
            return result, nil
        }

        // Don't retry user cancellations
        if errors.Is(err, ErrUserCancel) {
            return Result{}, err
        }

        // Exponential backoff for rate limits
        if errors.Is(err, ErrLLMRateLimit) {
            time.Sleep(time.Duration(attempt+1) * time.Second)
            continue
        }

        // Non-retryable error
        return Result{}, err
    }
    return Result{}, ErrMaxRetriesExceeded
}

What I learned:

errors.Is() and errors.As() are essential for error classification
Not all errors are retryable — know when to fail fast
Context cancellation should propagate immediately, not retry

Lesson 7: Testing Agents is Hard

How do you test something that calls external LLMs? My approach:

// Mock LLM for deterministic tests
type MockLLM struct {
    Responses []LLMResponse
    CallCount int
}

func (m *MockLLM) Complete(ctx context.Context, history []Message, tools []Tool) (LLMResponse, error) {
    if m.CallCount >= len(m.Responses) {
        return LLMResponse{}, errors.New("unexpected LLM call")
    }
    resp := m.Responses[m.CallCount]
    m.CallCount++
    return resp, nil
}

func TestAgentToolLoop(t *testing.T) {
    mock := &MockLLM{
        Responses: []LLMResponse{
            {ToolCall: &ToolCall{Name: "calculator", Input: `{"expr": "2+2"}`}},
            {Content: "The answer is 4"},
        },
    }

    agent := NewAgent(mock, []Tool{&CalculatorTool{}})
    err := agent.Run(context.Background(), "What's 2+2?")

    assert.NoError(t, err)
    assert.Equal(t, 2, mock.CallCount)
}

What I learned:

Mock the LLM, test the orchestration logic
Integration tests with real LLMs are flaky and slow — keep them minimal
Record/replay patterns work well for regression testing

What I'd Do Differently

1. Start with the binary constraint
I initially prototyped in Python, then rewrote in Go. Waste of time. If the constraint is "single binary," start with Go (or Rust).

2. Design state management earlier
I underestimated how complex conversation state gets. Tool results, errors, user corrections, context window management — it piles up fast.

3. Invest in observability from day one
Debugging an Agent is like debugging a distributed system blindfolded. Structured logging and tracing are non-negotiable.

The Bottom Line

Building an AI Agent in Go was the right call for this project. The language's strengths (static binaries, concurrency, simplicity) aligned perfectly with the goal of "download and run."

Is Go the right choice for every AI project? No. If you're training models or doing heavy data science, Python's ecosystem is unmatched. But for shipping a tool that uses AI? Go is surprisingly effective.

If you're curious, check out the code: github.com/the-open-agent/openagent

I'd love to hear your thoughts — especially if you've built Agents in other languages. What worked? What didn't?

Built with Go, excessive amounts of coffee, and the stubborn belief that software should just work. ☕