Hello there!
A few months ago I built nevinho, a personal AI agent I run on my own machine. Bash, file edits, web search, voice input, the works. It taught me a lot, but the whole thing was hardcoded around my own use case. Anyone who wanted something similar had to fork it and rip it apart.
So I started over. Vikusha is the same idea, but as a Go framework. You bring your own system prompt, your own tools, your own transports, and the harness handles the rest.
Which means I'm writing the core agent loop again. This time as a reusable framework, so others can build their own agents on top of it instead of forking mine.
This post is about that loop. The thing every AI coding tool, every chatbot with tools, every "AI agent" is doing under the hood. Once you see it, you can't unsee it.
What an agent actually does
When you ask an AI assistant "what's in this directory?", a lot looks like it's happening. The model "decides" to run a command, "reads" the output, "answers" you. It feels intelligent.
What's really happening is a loop. You send the model your message plus a list of tools it can call. The model replies with either text (it's answering you) or a tool call (it wants to run something). If it called a tool, you run it, send the result back, and ask again. Eventually it replies with text and you're done.
That's it. That's the agent.
loop:
response = provider.complete(system, messages, tools)
text, tool_calls = split(response.content)
if no tool_calls: return text
messages += assistant(response.content)
messages += user(run each tool → tool_result)
end
cap iterations
No "reasoning engine", no chain-of-thought magic. The model decides what to do, you execute, the model sees the output, the model decides again. The loop is the abstraction.
The four things that bit me
When I first wrote this I got it wrong in roughly four ways. Each one took me a confusing afternoon to figure out.
Exit on absence of tool_use, not on stop_reason. Anthropic's API returns a stop_reason field. It feels like the right exit condition. It isn't. stop_reason can be max_tokens while there are still tool calls in the response. The actual signal is whether the response content has any tool_use blocks. If yes, run them and loop. If no, you're done.
Send the assistant's full content back, unchanged. When the model returns text plus tool calls, you have to append both as one message in the conversation history. If you split them, the API rejects the next request because the tool result references a tool_use id that isn't in the previous message anymore.
All tool results go in one user message. If the model called three tools in parallel, all three results have to come back in a single user message containing three tool_result blocks. Putting them in three separate messages breaks the pairing.
Errors are data, not exceptions. If a tool crashes or returns garbage, don't abort the loop. Wrap the error in a tool_result with is_error: true and send it back to the model. The model sees the failure and either retries with different input or tells the user what happened. If you throw, the user gets nothing.
These four rules are the entire correctness of the loop. Everything else is wrapping.
Two providers, same loop
Here's where it gets interesting. Anthropic and OpenAI both support tool calling, but their wire formats are nothing alike.
Anthropic puts tool calls inside the assistant's content array, alongside text blocks. OpenAI puts them in a separate tool_calls field on the message. Anthropic sends tool arguments as a JSON object. OpenAI sends them as a JSON-encoded string. Anthropic puts the system prompt at the top level of the request. OpenAI prepends it as a message with role "system".
If you build the loop against one of them, the other looks like a totally different problem.
The fix is to have your own internal representation and translate at the edges. In Vikusha I have a generic llm.Block type with three variants: text, tool_use, tool_result. The agent loop only knows about blocks. Each provider has a Complete method that takes a generic request and returns a generic response. The translation lives inside the provider package, hidden behind the interface.
type Provider interface {
Name() string
Complete(ctx context.Context, req *Request) (*Response, error)
}
That's the whole contract. Plug in Anthropic, OpenAI, OpenRouter, Ollama, whatever. The loop doesn't care.
This abstraction earns its keep the moment you switch providers. I started building against Anthropic, ran out of API credit, switched to OpenRouter (which speaks the OpenAI dialect), and the agent code didn't change a line. Same loop, same Chat call, same tool execution. Just a different constructor.
Tools, the easy part
A tool in Vikusha is anything that satisfies this interface:
type Tool interface {
Name() string
Description() string
Schema() json.RawMessage
Run(ctx context.Context, input json.RawMessage) (string, error)
}
Name and description are what the model sees when deciding whether to call you. Schema is JSON schema for the input parameters. Run executes the thing and returns text.
The first real tool I built was file_read. It's about 30 lines.
func (r *Read) Name() string { return "file_read" }
func (r *Read) Description() string {
return "Read the contents of a file at the given path."
}
func (r *Read) Schema() json.RawMessage {
return json.RawMessage(`{
"type": "object",
"properties": {"path": {"type": "string"}},
"required": ["path"]
}`)
}
func (r *Read) Run(ctx context.Context, input json.RawMessage) (string, error) {
var in struct{ Path string }
if err := json.Unmarshal(input, &in); err != nil {
return "", err
}
data, err := os.ReadFile(in.Path)
if err != nil {
return "", err
}
return string(data), nil
}
The model gets the name, description, and schema in the request. When it wants to read a file, it sends back {"name": "file_read", "input": {"path": "go.mod"}}. The loop looks the tool up by name, runs it, and feeds the result back as a tool_result block.
Same shape for any tool. Bash, web fetch, Notion, whatever.
The smallest working agent
Putting it all together looks like this:
reg := tool.NewRegistry()
reg.Register(file.NewRead())
a, err := agent.New(agent.Options{
Name: "reader",
Model: "openai/gpt-4o-mini",
SystemPrompt: "You answer questions about files. Use file_read.",
Provider: llm.NewOpenRouter(apiKey),
Tools: reg,
})
reply, _ := a.Chat(ctx, "lucas", "Read go.mod and tell me the module name.")
fmt.Println(reply)
That's a working agent. The model gets the question, sees it has a file_read tool, calls it with path: "go.mod", the loop reads the file, feeds the contents back, the model extracts the module name and answers in plain text.
No frameworks, no abstractions on top of abstractions. One interface per concept, one loop, one provider call per round.
What's next
Right now I have a single-turn agent that can read files. The obvious next step is bash. Every coding agent needs to run commands, and that's the difference between an agent that can look at things and an agent that can actually do them.
The interesting part about bash isn't the implementation. It's the safety wrap. A tool that runs arbitrary shell commands needs a timeout, an output cap, and some way to catch dangerous operations before they execute. That's where most of the design work goes, and it's the next thing I want to write about.
The loop itself won't change. Same Tool interface, same Schema and Run pattern as file_read. Which is the point. The loop is the load-bearing part of the harness. Tools are just things you plug in.
The code is on GitHub, MIT licensed. There are two runnable examples in examples/ if you want to try it. The whole core is under 500 lines so far, including both providers. Feel free to open an issue or read along as it grows.
Hope this was useful!
Top comments (0)