DEV Community

Cover image for Writing API docs an AI agent can actually consume
Virginia Nyambura  Mwega
Virginia Nyambura Mwega

Posted on • Originally published at virginiamwegahashnodedev.hashnode.dev

Writing API docs an AI agent can actually consume

Your docs are written for a human who can guess. The agent calling your API can't.

I found the gap the embarrassing way: one of my own agents couldn't call one of my own APIs.

FamNest runs a small agent graph — a router hands a parent's message to a retriever, the retriever grounds an answer in a vetted corpus, a coach agent (Groq, Llama 3.3 70B) drafts a reply, and a safety-reviewer agent signs off before anything reaches a human. The agents call internal endpoints the same way a third-party integrator would. And one afternoon the coach kept constructing malformed calls to the retrieval endpoint — wrong field name, missing a required filter, occasionally inventing a parameter that never existed.

The endpoint wasn't broken. The docs were. They were written for a human who could fill in the blanks, and the agent had no blanks to fill — only the tokens I gave it.

That's the whole lesson, and it's worth more than a trend.

The trend everyone's shipping — and where it stops

If you've touched developer tooling in 2026 you've watched llms.txt go from a September-2024 proposal to a routine piece of infrastructure. It's a Markdown file at your domain root that points AI systems at the content that matters, with a one-line summary of each link. Mintlify, Fern, and GitBook ship one-click toggles for it. IDE agents — Cursor, Windsurf, Claude Code, Copilot — fetch it when you point them at a docs site, then pull only the linked pages they need before writing code. LangChain even shipped an MCP server (mcpdoc) that hands those files to host apps as a fetch_docs tool.

People are calling this the Business-to-Agent web, and the framing is right: just as you once needed a site humans could navigate, you now need surfaces agents can route on. Ship the llms.txt. It's a half-day of work.

But notice what it actually solves: discovery. It answers "which page matters." It says nothing about the harder question that broke my coach agent:

Once the agent has found your endpoint, can it call it correctly on the first try — with no human in the loop to recover when your prose is ambiguous?

That's not a discovery problem. That's a contract problem. And it's where most docs quietly fail.

An agent is a different kind of reader

A human reading your docs brings a lifetime of priors. They infer that userId is probably a UUID. They notice the example uses snake_case and adjust. They hit a 400, shrug, read the error, and try again. If they're really stuck they ask a teammate. Human docs can be good enough because the human closes the gap.

An agent closes nothing. It has your tokens and a probability distribution. It pattern-matches structure: if your example shows one field, it produces one field; if you describe an error in a sentence, it treats the sentence as flavor, not as a branch it has to handle. Ambiguity doesn't make an agent cautious — it makes it confident and wrong.

So the doc stops being documentation and becomes the interface itself. Everything the agent will ever know about your endpoint is in the text. If a fact isn't on the page, it doesn't exist.

That reframes what a good endpoint doc has to contain. Here are the five things mine were missing.

  1. A typed schema, not a prose description

Prose says: "Send the user's question and an optional list of topic tags."

A schema says exactly what's allowed, and an agent can pattern-match it without guessing:

ts// retrieve — request
const RetrieveRequest = z.object({
query: z.string().min(1).max(2000),
topics: z.array(z.enum(["sleep", "feeding", "behavior", "self_care"]))
.max(4)
.default([]),
topK: z.number().int().min(1).max(10).default(5),
});

The difference is the enum, the bounds, the default. "Optional list of topic tags" let my coach invent "toddler_tantrums". z.enum([...]) makes the valid set unguessable-wrong. Publish the schema, not a paragraph about the schema.

  1. Exhaustive examples — including the unhappy paths

Agents copy examples. Whatever you show is what you'll get back. If your only example is the happy path, the happy path is the only thing the model knows how to produce.

So I document the empty result and the rejected request as first-class examples, not footnotes:

``jsonc// 200 — results found
{ "matches": [{ "id": "c_18", "score": 0.82, "text": "..." }], "truncated": false }

// 200 — valid query, nothing relevant (NOT an error)
{ "matches": [], "truncated": false }

// 422 — query failed validation
{ "error": "validation_error", "field": "topics", "detail": "unknown topic 'toddler_tantrums'" }`

The middle case is the one humans leave out and agents desperately need. "No matches" is a normal outcome, not a failure — and if you don't say so, the agent will treat an empty array as a bug and retry forever.

  1. An error taxonomy with recovery semantics

Most docs describe errors. Agents need to be told what to do about them. "Returns 429 when rate-limited" is a description. An agent needs a decision.

So I ship a table where every row ends in an action:

Code error Cause What the caller should do
422 validation_error Bad input Fix the field named in detail; do not retry unchanged
429 rate_limited Too many calls Back off using Retry-After; retry the same body
503 model_unavailable Upstream LLM down Fall back to cached/deterministic path; do not retry tightly
409 idempotency_conflict Key reused, different body Stop; surface to a human

A human reads that table for reference. An agent reads it as a control-flow graph. The "do not retry" cells are the ones that stop a confused agent from hammering your endpoint at 3am.

  1. An explicit determinism / idempotency contract

The single most useful sentence I added to any endpoint doc was: "Is it safe to retry this?"

Agents retry. Networks are flaky, and a retried call that isn't idempotent is how you double-charge a card or send two replies to one anxious parent. For anything with a side effect, I now state the contract in the doc itself:


Idempotency: required for POST /coach/reply and all payment routes.
Send an `Idempotency-Key` header (UUID). Replays with the same key + same
body return the original result. Same key + different body → 409.
Retrieval (GET /retrieve) is side-effect-free and safe to retry freely.

That paragraph is the difference between a retry loop that heals and one that does damage. It's also the kind of thing humans infer and agents simply won't — there is no prior that tells a model your payment webhook is replay-safe. You have to say it.

  1. Auth and limits as data, not folklore

"Authenticated requests only, please don't spam it" is not a contract. Scopes, the exact header, the rate limit, and the window belong in the doc as structured fields the agent can read and self-regulate against:

`plaintext


Auth: Bearer token in
Authorization. Scopecoach:readfor /retrieve.
Limits: 60 req/min/token. On exceed → 429 +
Retry-After` (seconds).

`plaintext

`

Now the agent can pace itself instead of discovering your limit by tripping it.

Keep it honest: one source of truth

All of this rots the moment your docs and your code disagree — and an agent can't smell a stale doc the way a human can. So the contract has to be generated, not hand-maintained.

My chain is boring on purpose: the typed Next.js handler validates with the Zod schema, the schema generates the OpenAPI spec, and my llms.txt links to the generated reference. The schema is the only thing I edit. The doc can't drift, because the doc is downstream of the thing that's actually true.

*Zod schema ──► request validation (runtime)

└────────► OpenAPI spec ──► /llms.txt entry ──► agent reads it
*

If the handler changes, every artifact downstream changes with it. The doc lies only if the code lies.

The test that actually proves it

Here's the check I run before I trust an endpoint doc: give a fresh model only the doc — no codebase, no context — and ask it to (a) construct a valid call and (b) handle a seeded error. If it can't, the gap is in the doc, not the model. I keep these as tiny snapshot tests next to the endpoint, so a doc regression fails CI like any other bug.

When my coach agent broke, this test would have caught it in seconds. The retrieval doc, fed to a cold model, produced exactly the malformed call I saw in production — because the ambiguity was right there on the page.

This isn't new. It's just a new reader.

None of this is novel discipline. "Unambiguous, complete, verifiable" is the spine of IEEE 29148 — the requirements standard I write everything against. What's changed is that the consumer of your interface is no longer guaranteed to be a person who can paper over a vague spec. Half your integrators in 2026 are agents, and they read your docs literally, exhaustively, and without charity.

Ship the llms.txt so agents can find you. But the thing that makes them succeed once they arrive is older and less glamorous: a contract precise enough that it can't be misread. The agentic web doesn't need prettier docs. It needs docs that can't be guessed wrong.

I build FamNest, an AI wellness coach for busy parents, and write about production reliability and safety for multi-agent systems. If you're documenting an agent-callable API and want a second pair of eyes on the contract, my notes are open.

Top comments (0)