DEV Community

Cover image for I built my first AI agent. It was mostly plumbing
Rue Matchaba
Rue Matchaba

Posted on

I built my first AI agent. It was mostly plumbing

I spent a weekend trying to understand how AI agents actually work. Not the pitch deck version. The code version. What does function calling look like in practice? What happens when two agents are chained together and one of them fails?

I built a multi-agent research assistant in TypeScript to find out. Three agents: an orchestrator, a summarizer, and a writer. Each one does one job and hands off to the next.

Running a model locally is weirder than it sounds.

I used Ollama, which runs the model directly on your machine. No API key, no remote server. The first time my Express app got a real response back from localhost, I sat there for a second. It’s just an HTTP call. Your code genuinely cannot tell whether there’s a llama3.2 process on your laptop behind it or a data centre somewhere.

Nobody told me that LLMs are just APIs. Text in, text out. Everything interesting happens inside that call, invisible to your code.

Function calling is less magic than I expected, which was both a relief and slightly disappointing.

You describe available tools in the system prompt. The model decides whether to use one and returns structured output with a tool name and arguments. Your code runs the tool and feeds the result back. That’s the whole loop.

I kept waiting for something more mysterious. It’s just prompt engineering and plumbing.

The chaining part was mostly plumbing.

Agent 1 runs, its output becomes input for Agent 2, repeat. What actually took time was writing each agent’s system prompt tight enough that it wouldn’t drift into doing something adjacent to its job, and handling failures mid-chain without the whole thing collapsing silently.

I wrote more error handling code than AI code. I think that’s correct.

If I did it again.

llama3.2 via Ollama is free and runs locally, but it gets shaky on complex instruction-following. If your pipeline depends on consistent structured output, that inconsistency compounds across three agents fast. A hosted model with real function calling support would have saved me some debugging time.

I also didn’t document anything while I was building. By the time I was done, I’d forgotten what actually confused me at the start. Writing this post took longer than it should have for that reason. If you want to understand agents without paying for API calls, Ollama gets you there. Just design around what the model can’t reliably do. Super excited to see what l can build.

Top comments (0)