Nobody Reads My Docs Anymore—Not Even the AI Agents

mixcode@github — Fri, 12 Jun 2026 06:04:28 +0000

A few days ago, I sat down to do something I've done a hundred times: write documentation for a library I made.

The project is called binarystruct—a Go library that helps serialize and deserialize binary data using struct tags. With tools like Claude Code, Cursor, and Gemini by my side, I had finally updated the library after years of neglect.

Satisfied with the code, I prepared to write the docs. And then, I had a sudden realization: No human is ever going to read this.

Almost every developer using my library today would not read my documentation; they are copying my repository URL, pasting it into a coding agent, and saying: "Implement a ZIP encoder using this library."

So I decided to adapt my repository for these new AI consumers. I read the community standards, scaffolded the emerging llms.txt and llms-full.txt files, wrote a detailed agent-facing manual, and prepared for a future of frictionless AI integrations.

Then I actually measured it. And that's when I discovered I was almost completely wrong about what helps.

The Twist: I Built a Manual No Agent Would Read

To verify that my new agent-facing manual actually helped, I ran a quantitative study—not the usual "hey agent, was this doc useful?" (agents are relentlessly agreeable), but a controlled experiment. I took fresh, sandboxed agents, gave them a real build task against my library, and counted two things: did they succeed, and how many tokens did they burn doing it—with the manual present versus stripped out.

The results were a wake-up call:

The manual went unread. On the library, agents almost never opened my carefully crafted llms-full.txt. They ran an ls, opened the example tests and the source, got what they needed, and stopped. In one library run, the self-directed agents never opened the manual.
When forced to read it, they did worse. Prepending "read the manual first" to a well-documented library made the agent ingest ~8k tokens of prose on top of source it was going to read anyway—pushing token cost up ~20–30% with no gain in success.
But agents don't always distrust docs—and that's the dangerous part. When a fact is checkable (a signature, a macro, a flag), agents cross-check it against the source and route around a wrong doc just fine. But when a fact is a convention they can't verify—a unit, a default, an ordering—they trust the prose blindly. I planted one wrong line ("amount is a percentage 0–100" when the code meant a 0–1 fraction) and every agent failed the task, confidently, every time, never once checking the code.
Then flip the repo type, and the manual suddenly earns its keep. Point the same experiment at a CLI tool instead of a library and the result inverts. A CLI's operational contract—its flags, exit codes, side effects, the incantation for running unattended—lives in no function signature the agent can read, so this time the agents reached for the root docs on their own, and a focused manual cut their token cost by ~30%. The manual wasn't the thing that mattered; whether the answer already sat in the code was.

So the romantic vision—write a beautiful manual, agents read it, everything works—was wrong on its face. (And I'm not alone here: a recent paper from ETH Zurich, "Evaluating AGENTS.md", ran a massive study on coding agents and found the same split—repository context files added over 20% inference cost across the board, and the auto-generated ones, thick with redundant restatement, actually hurt success rates. But the human-written files—the ones that captured non-inferable detail the code couldn't tell you—came out ahead, and the authors' recommendation was to write down only what isn't already obvious from the code—the exact concept of "residue" arrived at independently.)

So my grand documentation plan had mostly failed—but a failed plan that tells you why is a result. The more I dug, the more everything seemed to fall out of one underlying law.

The One Law: A Manual's Value Is the Residue

Here's the result everything else falls out of:

A manual's value = what the task needs − what's already legible to the agent.

"Legible" means everything the agent gets for free: the source code it reads, the API names that advertise their own meaning (arrayRepeat, timeout_ms), and its training prior (it already knows HTTP 404, that manga reads right-to-left). Whatever the task needs beyond those three is the residue—and the residue is the only thing a manual supplies that nothing else can.

This law immediately explains the split I measured. It is not "agents ignore manuals." It's where the contract lives:

Source-legible library → the contract is right there in the signatures and docstrings. The agent reconstructs it cheaply, the residue is near-empty, and a separate manual is dead weight (neutral when ignored, a tax when forced).
CLI, service, compiled binary, or indirection-heavy code → the contract is not legible from what the agent can read. Exit codes, side effects, unattended flags, a remote API's schema—none of it lives in a signature. Here a lean, focused manual is the only surface that carries the contract—and it's exactly the case where the agent went looking for it.

Same artifact, opposite verdict—decided entirely by whether the contract was already legible. That single diagnostic is the heart of the whole method.

Working with the agent, I distilled this into the Agent-Friendly Guide. What follows are its principles, ordered by leverage—highest first.

The Principles of Agent-Friendly Codebases

1. Keep the Read Surface True—and Mind the Verifiability Gradient

If documentation disagrees with the code, the agent may trust the docs and fail silently. The asymmetry is the whole point: a missing fact costs tokens (the agent reads the source and recovers); a wrong fact costs correctness (the agent trusts it and fails).

But not all drift bites equally:

Verifiable drift self-corrects. A wrong comment about a signature, a macro, or a flag is harmless—the agent checks the code and ignores it (I measured this across Go, JS, C, and PHP: it recovered every time).
Unverifiable-convention drift is catastrophic. A wrong unit, default, or ordering—something with no checkable output—has nothing to catch it. The agent trusts it blindly and fails (every time, across five languages).

So when you diff docs against code, prioritize the unverifiable: conventions, units, defaults, semantics. For a source-shipped library, eliminating this drift is the single highest-value thing you can do—above writing any manual.

2. Put Critical Info on the Verified Read Path

Agents find documentation by ls + filename relevance (e.g., example_test.go, main.go)—not by grepping content. They open the files whose names look task-relevant, get enough, and stop. A bolt-on manual is skipped once the task-relevant reads satisfy them—and you cannot bait them to it by naming. (I tested a file literally named read_first_for_coding_agent.md; it barely moved the open rate.)

So place the decisive traps and recipes directly where the agent is guaranteed to look. Which surface that is, is repo-specific—verify it, don't assume. In one library the agents went straight to the tag-reference doc and the Example tests and never opened doc.go, so a pointer placed in godoc was never seen. Watch what a clean agent actually opens, then put the content there:

Inside runnable Example tests.
In the doc surface the agent actually reads (verified, not assumed).
In CLAUDE.md, which Claude Code is hardcoded to read at the start of every session.

3. Design the API to Shrink the Error Surface (The Deepest Lever)

Documentation prevents mistakes about the API; code design prevents the opportunity for mistakes. In binarystruct, declarative struct tags do the bookkeeping that agents (and humans) get wrong:

type Header struct {
    Signature   uint32  `binary:"const=0x04034b50"`      // validate the magic number automatically
    CompSize    uint32  `binary:"valueof=bytelen(Body)"` // derive the length; no manual bookkeeping
    Body        []byte  `binary:"[CompSize]byte"`
}

By removing the need for the agent to manually calculate offsets or keep a length field in sync with its data, the most common source of integration bugs simply vanishes. Heuristic: every place your docs say "remember to…" is a candidate for a design change that removes the need to remember.

4. Name the Traps (Don't Hide Them)

APIs have warts. Document them explicitly with the why and a rule (e.g., an inconsistent argument order: "stream first, then order, then value; the value-first variant has no stream") rather than hiding them or making a breaking change. If a competent developer could write a wrong call from the signature alone, that's a trap—name it.

5. Copy-Pasteable Recipes for the Common Path

Give the agent a known-correct template for the one or two patterns that cover most real usage. A copy-pasteable recipe for a variable-length record layout means the agent copies a correct structure instead of stitching it together from primitives and guessing.

6. Selection Surface: Help the Agent Choose

Before use comes select. State in one line what the tool is for, what it is not for, and the trigger phrases that match how an agent would describe the task. For a library that's the README's first sentence; for a CLI, the --help description; for an MCP tool, the tool description. This matters most for tools, where the agent picks among many at invocation time.

7. Contributor Guidelines for Agents (`AGENTS.txt`)

If agents are going to extend your codebase, write down the architectural invariants they must preserve — above all, "when the same behavior lives in more than one place (a reference path plus an optimized or generated one), a change must land in all of them." Name the file AGENTS.txt (not .md) so it doesn't clobber a maintainer's own AGENTS.md or CLAUDE.md.

8. Prove It With a Clean-Agent Evaluation

You cannot grade your own homework. Spawn a fresh agent in a clean directory, have it build something real against your published package, and read its friction log. Measure on two axes, in this order:

Correctness first. Run N fresh agents per condition and compare pass-rate, not vibes.
Cost second. Count the novel tokens the reasoning model processes (intake + output, excluding cache re-reads).

And know the trap that makes this experiment lie: on small, well-documented targets the token axis saturates — at equal success, a capable agent solves from source either way and the delta collapses to ≈neutral. Read naively that says "artifacts don't matter." It doesn't; it says this task couldn't see the difference. The decisive experiment is the failure regime: construct the exact condition your artifact targets — an unverifiable drift, or a missing piece of residue — and watch the pass-rate move from near-zero to near-perfect.

The Ouroboros: Automating Agent-Readiness

To make these principles easy to apply, I built a public repository and a companion Claude Code plugin called agent-ready: 👉 agent-friendly-guide

Install it, then run /agent-ready inside any repository. The skill offers three modes:

/agent-ready (or --full) — audits the codebase, scaffolds the documentation (llms.txt/llms-full.txt/AGENTS.txt), and runs a clean-agent evaluation.
--scaffold — runs the audit and scaffolds the templates, stopping before the evaluation.
--audit-only — generates a prioritized readiness gap report without writing any files.

It also carries a drift-audit step — the insurance for that one catastrophic case. It verifies every doc claim against the code; in my tests it caught and rewrote injected unverifiable-convention drift across Go, JS, C, and PHP, flipping the downstream task from all-fail to all-pass.

The method itself is vendor-neutral — it reasons about "an agent," not a product — and it's been validated across ten ecosystems (Go, Python, JavaScript, C, Rust, JVM, Shell, Swift, PHP, and Ruby) and on more than one model family.

Conclusion

When we write for AI agents, the instinct is to pile on more documentation. The measurements say otherwise: spend your effort on the residue—the project-specific knowledge that lives nowhere else—keep the surface the agent actually reads true, and make misuse hard by design. The rest, the agent already knows.

Even the AI Agents don't read my documents, but now I know they don't have to, and I don't have to either. I tell the agents what they don't know—the story about our project, our workflow, and our difficulties—like a grandma telling folklore to her grandchildren. And then they just tuck them somewhere they're likely to look. I don't remember where they hid them, but anyway, they'll be there. Less work, less time, and I got my weekends back. Happy Life.

DEV Community: mixcode@github