A minimal agent — call the model, run the tool it asks for, feed the result back, repeat — is genuinely complete for a demo. I wrote one in ~150 readable lines: https://github.com/mnifzied-create/agentloop.
But the moment real users hit it, eight things break. None of them need a framework — each is a small, readable layer on top of the loop.
1. The model asks for three tools at once — and you run them one at a time. Wrap the tool calls in Promise.all. Parallel by default.
2. One flaky API call kills the whole turn. Wrap each tool in a retry with backoff, and return the error as a string to the model instead of throwing — it can recover on the next step.
3. It forgets everything between requests. Persist threads. Node's built-in node:sqlite is enough — no service, no native build.
4. One user (or a runaway loop) runs up your bill. A token-bucket rate limiter, per user / IP.
5. The agent deletes a record / sends an email / charges a card — with no confirmation. Wrap irreversible tools in a human-in-the-loop approval gate.
6. You tweak the prompt and three behaviors silently regress. A tiny eval harness with pass/fail cases you run in CI.
7. One agent juggling twelve tools gets confused. Expose a whole agent as a single tool — a sub-agent — and let a parent delegate.
8. You're regex-parsing the model's prose for data. Force a tool call whose input_schema is your output type. Typed JSON, no parsing.
That's the entire gap between "works in the demo" and "works in production" — and every item is a small composable piece you can read top to bottom, not magic hidden in a dependency.
The free core (the loop) and these eight patterns are all in the repo — read every line: https://github.com/mnifzied-create/agentloop
The point isn't the code. It's that you can own an agent instead of importing one.
What breaks for you in production that isn't on this list?
Top comments (0)