I built a docs site without writing the docs: the part that was mine was a layer up

#ai #technicalwriting #documentation #devrel

I spent four articles on a single stubborn idea: that you cannot trust an AI agent to review documentation just because it tells you it did. The whole point was moving verification out of the agent's reach and onto disk, where a claim about a file and the file itself are not the same thing.

This piece is about the other direction. Not reviewing docs with an agent, but building them with one. I forked a popular LLM library whose entire documentation was a single flat README, pointed an agent at it, and a while later had a 12-page docs site. I wrote almost none of the prose.

I want to be honest about which parts of that were actually mine, because the answer is not the one the hype implies. The prose was the easy part to hand off. The work that mattered lived a layer up, and it is the part I would not delegate.

The canvas: a README pretending to be documentation

The library is LangExtract, an LLM structured-extraction tool with something north of thirty-six thousand GitHub stars. (Published by Google, though its own README is careful to say it is not an officially supported Google product, a caveat I kept in mind every time I was tempted to type the word "official.") For all that reach, the entire documentation was one long README. Install, a scattering of examples, some API notes, all stacked in a single scroll you navigate with the find command and a prayer.

That makes it a good canvas precisely because it is not a writing problem. The words mostly existed. What did not exist was a structure: a way for a developer to move from "what is this thing" to "how do I actually use it" to "something broke and I need the reference." Turning a README into a docs site is an architecture problem wearing a writing problem's clothes.

The part I would not hand off: the shape

Before the agent drafted a single page, the decision that mattered was already made, and I made it. What content types exist here, and how do they relate?

I used Diátaxis as the spine: concept pages that explain, how-to pages that walk you through a task, a reference you look things up in, a quickstart that gets you running. That sounds tidy on a slide. In practice it is a sequence of judgment calls the agent is perfectly happy to get wrong. A concept page wants to stay pure concept, and the instant it starts telling you which flag to pass, it has quietly become a how-to, and the boundary blurs until nothing is findable. So I trimmed the how-extraction-works page back to pure concept. I promoted the long-document workflow, the genuinely hard thing the library does well, into the flagship how-to instead of letting it rot in an examples dump.

The agent can draft any page you name. It cannot tell you which pages should exist.

That decision sits upstream of the prose, and it is the one that decides whether the finished site is navigable or merely tidy.

Splitting verification by what kind of thing it is

Here is where the gating work bled straight into the build. If you have read the earlier pieces, the reflex will be familiar: do not let the thing that produced the work be the only thing that vouches for it.

But "verify the docs" is not one question. It is at least three, and they need different instruments.

"Is the structure intact" is a mechanical question. Broken links, dead anchors, malformed markdown. So the build runs scans for exactly those, set to throw, which means a broken link fails the build instead of filing a polite warning nobody reads. Green or it does not ship.

"Is the code right" is a different question, and a scan cannot touch it. A code sample can be flawlessly formatted and completely wrong. So I added a small harness that runs the example code for real, against the live model, and checks that it does what the page claims. When I ran it, the examples passed with zero failures, and the long-document example came back with ten grounded extractions, which is the actual behavior the page promises. That is not the agent telling me the code works. That is the code working, in front of me.

"Is the architecture right, does this read on a phone, are these facts actually true" is a third kind of question, and that one is mine. No harness catches a content model that does not match how developers think.

The shape of the lesson has not changed. Confidence is not evidence. A green scan is evidence the structure holds. A passing harness is evidence the code runs. Neither is evidence the docs are any good, and that gap is where the human stays.

Where the agent drifted, and how I knew

An agent building documentation is confidently, fluently wrong in small ways, and the small ways are the dangerous ones, because they survive a casual read.

Two examples from this build. The first was a syntax landmine. Docusaurus changed how it parses admonitions, the little "Note" and "Warning" callout boxes, between major versions. The space-based form the agent had clearly absorbed from its training no longer parses the title in the current version, and the word "Title" leaks into the rendered page as literal text. The old syntax and the fix differ by a single character:

:::note Title
:::note[Title]

The first line is what the agent wrote. The second is what current Docusaurus actually wants. Nothing in the agent's prose flagged the difference. The rendered page flagged it, because part of verification was checking that zero stray colon-sequences survived into the final HTML.

The second was quieter and entirely mine to care about. The agent's prose was clean, professional, and absolutely riddled with em dashes. Ninety-eight of them. I have a rule, a slightly fussy personal one, that my published writing does not use em dashes, and a fork I am putting my name on counts as my published writing. So there was a pass that pulled all ninety-eight out of the prose and rewrote each into the colon or comma or parenthetical it should have been, while leaving the five that lived inside code blocks untouched (in code, a dash is a dash, not a stylistic opinion).

The thing no harness would have caught

I built and verified the site on a laptop, like a reasonable person. Then I opened it on my phone, like an actual reader, and it fell apart.

The API reference tables, perfectly comfortable on a wide screen, blew straight off the right edge on a narrow one. Long inline code in headings refused to wrap and shoved the whole layout sideways. The buttons on the homepage hero sat in a stubborn single row instead of stacking. None of this showed up in any scan, because none of it is wrong in the way a scan understands wrong. The HTML was valid. The links worked. It was just unusable in the hand of the person most likely to be holding it.

The fixes were small once I had seen the problem: let the tables scroll horizontally inside a contained box, break long words in headings, let the hero buttons wrap. But I only knew to make them because I held the thing the way a reader would, and that instinct, to go check the lived experience rather than the rendered output, is the most durable part of the job and the one I have least idea how to hand to an agent.

Believing the green light, for the right reason

The last pass was a separate read of the agent's factual claims against the actual source code, treating every confident assertion as a hypothesis until the source confirmed it.

It came back with zero accuracy errors. Which is a strange thing to be pleased about, so let me be precise about what it means. It does not mean the check was unnecessary. It means the check said the docs were accurate and I could believe that, because I had looked, not because the agent had assured me. The subtle things it confirmed were exactly the kind a casual read waves through: that the Ollama backend defaults its model URL to localhost:11434, that the library does not silently load a .env file because there is no load_dotenv call anywhere in the source, that the OpenAI batch setting's threshold is a count of queued prompts, not a token budget or a time window. All true. All checkable. None of it taken on faith.

This is the whole argument from the gating pieces, restated for building instead of reviewing. The agent does not get to be both the author and the only witness. Zero errors is a good outcome only when zero errors is a measured fact.

Velocity from rigor, not despite it

The story people expect about AI and documentation is a speed story: the agent writes fast, you ship fast, and quality is the tax you grudgingly pay for the pace. That is not what happened here, and the inversion is the part worth keeping.

The site came together fast because the verification was mechanical and sorted by type. The scans spared me re-reading for broken links. The harness spared me wondering whether the code ran. And that is precisely what freed my actual attention for the things that needed a person: the content model, the page that disintegrated on a phone, the claims that needed confirming against source.

The rigor is not friction dragging on the velocity. The rigor is what made the velocity safe to want.

Where I am right now: the site is live, it is a deliberately narrow slice, and it is built so the next slice does not demand a rebuild. A troubleshooting page, a custom-provider guide, a deeper reference, all have somewhere to land. I left the room on purpose, because the fastest thing you can do for future-you is not to finish. It is to build something future-you can extend without tearing it down.

I am still working out how much of this generalizes. The verification-by-type split feels portable. The "go hold it like a reader" instinct feels stubbornly manual. If you have built docs by directing an agent and found a cleaner seam between the parts you delegate and the parts you keep, I would like to hear where you drew the line.

I am writing this while it is still in motion, which is the only honest way I know to write about a practice I am still figuring out. If you are coming at the same problem from a different angle, the comments are open and I am reading them.

I write about AI-assisted documentation workflows, developer experience, and the evolving role of technical writers. If any of this resonates, let's connect on LinkedIn.