Five problems every agent loop has. No framework needed.

#ai #rust #agents #opensource

Most agent failure modes are not interesting. They are boring. They are the same five problems in different costumes. After eighteen months running agent loops in production, I keep meeting these five and only these five.

I do not build agent frameworks. I build small libraries that fix one failure mode each. You install the one you need. The composition emerges from your code, not a framework's architecture diagram.

Here they are in roughly the order you will hit them.

1. The JSON is not JSON

Your model returns Sure, here you go:\njson\n{...,}\n``. You parse it. You crash.

Fix: repair before validate. Strip the fence. Extract the largest balanced JSON object from surrounding prose. Remove trailing commas. Then validate against your schema. If validation fails, send the model back a precise hint, not a generic "invalid JSON, please try again."

The hint is the trick. Smaller models self-correct beautifully on a structural complaint. They do not self-correct on a vague reprimand.

2. The tool args are wrong

The model picks the right tool. It calls it with units: "kelvin" against an enum of ["c", "f"]. You run the tool. Bad things happen.

Fix: validate every tool call against its schema before running it. Validation issues become the tool's response. Feed them back. The model fixes the call on the next turn.

Always return all validation issues at once, not the first. The model fixes them in one retry, not five.

3. The agent wanders to the wrong network

The minute your agent can pick URLs, you handed it URL-picking power. A confused-deputy bug or a prompt injection sends a fetch to a domain you did not authorize. By the time you notice, an API key is in an attacker's log.

Fix: declarative domain allowlist. List the four hosts your agent legitimately needs. Block everything else with an error message at the HTTP layer.

4. The context budget runs out

You stack five turns of chat history. You drop the system message accidentally during truncation. The agent forgets what it is doing. Or you drop the trailing user turn and the model answers a question you never asked.

Fix: anchored truncation. Preserve the leading system message and the trailing user turn. Drop from the middle.

Drop-oldest is the right default for chat. Drop-middle is better when you want both early grounding and recent context. Both keep the load-bearing pieces of the prompt.

5. Regressions sneak in

You tweak a system prompt. The agent now picks tools in a slightly different order. Sometimes that is fine. Sometimes it is a regression that breaks the deployed app and you only notice next Friday.

Fix: snapshot tests for agent traces. Record one run end-to-end. First test run writes the snapshot. Later runs diff and fail with a unified diff if anything changed. Refresh with an env var when the change is intentional.

The framework I did not write

A naive agent loop hitting all five of these:

`rust let fitted = Fitter::new(8_000).fit(messages, Strategy::DropOldest); let raw = call_model(&fitted).await?; let action = caster.parse(&raw)?; if action.kind == "tool" { validator(&action.tool)?.validate(&action.args) .map_err(|e| anyhow!(e.for_llm()))?; run_tool(&action).await } else { Ok(action.text) } `

Ten lines. Five concerns. Each concern is a separate 200-line library that does one thing and ships independently.

The framework version of this is 2000 lines, locks you to one HTTP client, opinionated about which provider, and bundles all five concerns into a single API you cannot pry apart when one of them is wrong.

I have shipped both kinds. The small libraries win every time.

If you are building an agent and have not hit problems 1 or 2 yet, you will. Skip the framework. Pick the small library when the problem actually shows up. Compose.

That is the whole stack.