DEV Community

Cover image for I Built an AI Agent Harness in Go
Lucas Neves Pereira
Lucas Neves Pereira

Posted on

I Built an AI Agent Harness in Go

Hello there!

I've been using AI tools a lot lately. ChatGPT, Claude, local models with Ollama. They're great for answering questions, but I wanted something that could actually do things. Search the web, run code, save notes. Not just talk about it.

So I started building nevinho, a personal AI agent that lives in my Discord DMs. You send it a message, it figures out what tools to use, and it gets things done. All from my phone.

I wrote it in Go and I want to walk through how it works, what I got right, and what I still need to fix.

Why Go

I wanted a single binary I could drop on any machine and run. No runtime, no virtualenv, no node_modules. Go gives me that.

The standard library covers most of what I needed. HTTP clients, JSON encoding, crypto, process execution.

Go also makes concurrency explicit. Each user gets their own mutex, tool execution has panic recovery, and the OAuth polling runs in goroutines. It's not magic, but it's easy to reason about.

The architecture

The project has six packages, each with a single responsibility:

nevinho/
├── main.go           // entry point, provider detection
├── agent/            // the core loop
├── llm/              // provider abstraction (Anthropic, OpenAI, Ollama)
├── tools/            // web, code, file operations
├── discord/          // bot interface
├── auth/             // OAuth device flow + encrypted storage
└── logger/           // colored terminal output
Enter fullscreen mode Exit fullscreen mode

The agent doesn't know about Discord. The tools don't know about the LLM. The LLM doesn't know about either. Everything connects through interfaces and simple function calls.

The LLM provider interface

I wanted to switch between Claude, GPT, and local Ollama models without changing any agent code. So I defined a provider interface:

type Provider interface {
    Complete(req *Request) (*Response, error)
    FormatUserMessage(text string) json.RawMessage
    FormatToolResults(results []ToolResult) []json.RawMessage
    Model() string
}
Enter fullscreen mode Exit fullscreen mode

The FormatUserMessage and FormatToolResults methods exist because Anthropic and OpenAI structure their messages differently. Anthropic wraps tool results in a single user message with tool_result content blocks. OpenAI expects separate messages with a tool role. The provider handles that translation.

Switching models at runtime is a slash command. The SwitchModel method detects the provider from the model name prefix:

func (a *Agent) SwitchModel(name string) error {
    var p llm.Provider
    switch {
    case strings.HasPrefix(name, "gpt-") || strings.HasPrefix(name, "o1-") ||
         strings.HasPrefix(name, "o3-") || strings.HasPrefix(name, "o4-"):
        if a.config.OpenAIKey == "" {
            return fmt.Errorf("OPENAI_API_KEY not configured")
        }
        p = llm.NewOpenAI(a.config.OpenAIKey, "", name)
    case strings.HasPrefix(name, "claude-"):
        if a.config.AnthropicKey == "" {
            return fmt.Errorf("ANTHROPIC_API_KEY not configured")
        }
        p = llm.NewAnthropic(a.config.AnthropicKey, "", name)
    default:
        if a.config.OllamaURL != "" {
            p = llm.NewOpenAI("", a.config.OllamaURL, name)
        } else {
            return fmt.Errorf("unknown model: %s", name)
        }
    }

    a.mu.Lock()
    a.llm = p
    a.history = make(map[string][]json.RawMessage)
    a.mu.Unlock()
    return nil
}
Enter fullscreen mode Exit fullscreen mode

Ollama uses the OpenAI-compatible API, so I just reuse the OpenAI provider with a different base URL. The default fallback catches anything that isn't clearly GPT or Claude and routes it to Ollama if configured.

History gets cleared on switch because message formats differ between providers.

The agent loop

The core of the project is the Chat method. It sends the conversation to the LLM, and if the model wants to use tools, it executes them and loops.

func (a *Agent) Chat(userID, text string) (string, error) {
    lock := a.getUserLock(userID)
    lock.Lock()
    defer lock.Unlock()

    // If there's a pending approval and the user said "yes", handle it
    if p := a.tools.PendingApproval(userID); p != nil && looksLikeApproval(text) {
        switch p.Kind {
        case "path":
            a.tools.ApprovePending(userID)
            text = text + "\n[Access granted to " + p.Detail + ". Retry the file operation.]"
        case "code":
            output := a.tools.ExecutePendingCode(userID)
            text = text + "\n[Code execution approved. Output:\n" + output + "]"
        }
    }

    a.appendHistory(userID, a.llm.FormatUserMessage(text))

    for range maxLoops {
        resp, err := a.llm.Complete(&llm.Request{
            SystemPrompt: systemPrompt,
            Messages:     a.history[userID],
            Tools:        a.tools.Defs(),
            MaxTokens:    maxTokens,
        })
        if err != nil {
            return "", err
        }

        a.history[userID] = append(a.history[userID], resp.AssistantMessage)

        if len(resp.ToolCalls) == 0 {
            return resp.Text, nil
        }

        var results []llm.ToolResult
        needsApproval := false
        for _, tc := range resp.ToolCalls {
            output := a.executeTool(tc.Name, tc.Input, userID)
            results = append(results, llm.ToolResult{ID: tc.ID, Output: output})
            if strings.HasPrefix(output, "NEEDS_APPROVAL:") {
                needsApproval = true
            }
        }

        // The API requires tool results after every assistant message with tool_use
        for _, msg := range a.llm.FormatToolResults(results) {
            a.history[userID] = append(a.history[userID], msg)
        }

        if needsApproval {
            return approvalMessage(a.tools.PendingApproval(userID)), nil
        }
    }

    return "I hit my limit on tool calls. Try breaking it into smaller tasks.", nil
}
Enter fullscreen mode Exit fullscreen mode

The loop runs up to 10 iterations. Each time, the model either responds with text (done) or asks to use tools (execute and loop again). This is what lets the agent chain actions: search the web, read a page, then summarize.

The approval check at the top is important. When a tool needs permission, the loop stops and asks the user. On their next message, if it looks like "yes" or "oui", the pending action runs and the result gets injected into the conversation.

Each user gets their own mutex so conversations don't corrupt each other's history.

Tools

The tool registry is a simple switch:

func (r *Registry) Execute(name string, input json.RawMessage, userID string) string {
    switch name {
    case "web_read":
        return r.webRead(input)
    case "web_search":
        return r.webSearch(input)
    case "run_code":
        return r.runCode(input, userID)
    case "file_read":
        return r.fileRead(input, userID)
    case "file_write":
        return r.fileWrite(input, userID)
    default:
        return fmt.Sprintf("unknown tool: %s", name)
    }
}
Enter fullscreen mode Exit fullscreen mode

Each tool gets a json.RawMessage input and returns a string. Tool definitions with JSON schemas are sent to the LLM so it knows what's available.

Web search uses the Brave Search API if you have a key, with DuckDuckGo HTML scraping as a zero-config fallback. The DDG fallback parses the actual HTML result page, no API key needed.

Web read fetches a URL and strips the HTML down to readable text. It walks the DOM, skips script/style/nav/footer elements, and looks for a <main> or <article> tag to focus on content.

Code execution runs Python, Node, or bash with a 10-second timeout via exec.CommandContext. Output is capped at 5KB.

File operations use a per-user workspace at ~/.config/nevinho/workspace/{userID}/. Relative paths stay sandboxed there. Absolute paths require explicit approval.

The approval system

Letting an AI run arbitrary code is a bad idea. So dangerous operations need approval.

The code scans for dangerous patterns using regex before executing anything:

var dangerousPatterns = []*regexp.Regexp{
    regexp.MustCompile(`\brm\b`),
    regexp.MustCompile(`\bsudo\b`),
    regexp.MustCompile(`\bchmod\b`),
    regexp.MustCompile(`\bkill\b`),
    regexp.MustCompile(`\bcurl\b.*\|`),
    regexp.MustCompile(`:\(\)\s*\{`), // fork bomb
    // ... and more
}
Enter fullscreen mode Exit fullscreen mode

It also checks for sensitive paths like .ssh, .aws, .env, and credentials. If anything matches, the tool returns a NEEDS_APPROVAL: prefix instead of executing. The agent loop catches this, sends the user a preview of what it wants to do, and stops.

The user replies "yes" (or "oui") and the pending action runs. Approved directories persist to a JSON file on disk so you don't get asked twice.

There are two types of approval: path approval (for writing outside the workspace) and code approval (for dangerous commands). Path approvals are remembered. Code approvals are one-shot.

URL validation

Before fetching any URL, the web tools validate it to prevent SSRF:

func validateURL(rawURL string) error {
    u, err := url.Parse(rawURL)
    if err != nil {
        return fmt.Errorf("invalid URL")
    }
    if u.Scheme != "http" && u.Scheme != "https" {
        return fmt.Errorf("only http/https allowed")
    }

    ips, err := net.LookupIP(u.Hostname())
    if err != nil {
        return fmt.Errorf("cannot resolve host")
    }
    for _, ip := range ips {
        if ip.IsLoopback() || ip.IsPrivate() || ip.IsLinkLocalUnicast() {
            return fmt.Errorf("internal addresses not allowed")
        }
    }
    return nil
}
Enter fullscreen mode Exit fullscreen mode

Only http/https allowed, and it does a DNS lookup to block requests to localhost, private IPs, and link-local addresses. Without this, the model could try to hit internal services.

OAuth with device flow

I wanted nevinho to connect to GitHub and Google. But OAuth on a bot is tricky because there's no browser to redirect to.

The solution is the OAuth Device Flow:

  1. The bot requests a device code from GitHub/Google
  2. It shows the user a short code and a URL
  3. The user opens the URL on any device, enters the code
  4. The bot polls until authorization completes

From Discord it looks like this:

> /connect github
Bot: **Connect GitHub**
     1. Open: https://github.com/login/device
     2. Enter code: ABCD-1234
     Waiting for authorization...

Bot: Connected to GitHub as **lucasnevespereira**!
Enter fullscreen mode Exit fullscreen mode

The polling runs in a goroutine so it doesn't block the bot. If GitHub says "slow down", it bumps the poll interval by 5 seconds.

Right now the tokens are stored but not wired to any tools yet. That's next on the list.

Encrypted credential storage

OAuth tokens shouldn't sit in a plain JSON file. So I encrypt them with AES-256-GCM using Go's standard crypto package:

func encrypt(key [32]byte, plaintext []byte) ([]byte, error) {
    block, err := aes.NewCipher(key[:])
    if err != nil {
        return nil, err
    }
    gcm, err := cipher.NewGCM(block)
    if err != nil {
        return nil, err
    }
    nonce := make([]byte, gcm.NonceSize())
    if _, err := rand.Read(nonce); err != nil {
        return nil, err
    }
    return gcm.Seal(nonce, nonce, plaintext, nil), nil
}
Enter fullscreen mode Exit fullscreen mode

The encryption key comes from a NEVINHO_SECRET env var (hashed with SHA-256) or an auto-generated key file at ~/.config/nevinho/secret.key. Credentials are written with 0600 permissions.

Discord as the interface

I picked Discord because I already use it daily and it works on my phone. The bot only responds to DMs from a single owner configured via DISCORD_OWNER_ID.

Messages over 2000 characters get split at newline boundaries. Slash commands handle /new, /model, /status, /paths, /connect, /disconnect, /accounts, and /help. The same commands also work as plain text messages in the chat.

What's missing

I want to be honest about this. The project works, but it's not where I want it to be yet. Here's what a proper agent harness should have that I haven't built:

No retry logic. If the LLM API returns a 429 or 5xx, the conversation just fails. I need exponential backoff.

Sequential tool execution. When the model requests three tool calls at once, I run them one by one. They should run in parallel with goroutines and a WaitGroup.

No streaming. The bot waits for the full LLM response before showing anything. On long responses, Discord's typing indicator expires after 10 seconds and the user thinks it's stuck.

Message-count history trimming. I trim conversation history at 20 messages regardless of size. A message with 10KB of tool output uses the same slot as a 50-char user message. This should be token-aware.

No tool result summarization. When web_read returns 10KB of text, all of it goes into the history. That eats through the context window fast. Long tool outputs should be summarized before going back to the model.

These are all solvable. I'm working through them.

What I learned

The LLM API differences are annoying but manageable. Anthropic and OpenAI have different message formats, different tool call structures, different system prompt handling. But once you define a clean interface, each adapter is about 100 lines.

Safety is where most of the thinking goes. The agent loop itself is straightforward. The hard part is approving dangerous operations, sandboxing file paths, validating URLs, preventing SSRF. Every tool I add needs its own threat model.

Go was the right choice for this. Single binary, good concurrency primitives, a rich standard library. No framework needed. The whole project is about 2800 lines across 12 files.

Try it out

The project is open source. Feel free to check it out, open issues, or contribute.

GitHub: github.com/lucasnevespereira/nevinho

Hope you find it useful!

Top comments (1)

Collapse
 
ali_muwwakkil_a776a21aa9c profile image
Ali Muwwakkil

One surprising insight we've observed is that it's not the complexity of AI agents that trips up teams, but rather how they're integrated into existing systems. A practical approach is to use event-driven architectures to allow agents to react to real-time data streams, making them more adaptable and efficient. In our experience, embracing tools like Kafka or NATS can significantly enhance the responsiveness and scalability of AI-driven solutions. - Ali Muwwakkil (ali-muwwakkil on LinkedIn)