DEV Community

How I Documented an Entire Product in 4 Days with an AI Agent

Debbie O'Brien on May 13, 2026

I had 55 pages of documentation to write, 59 screenshots to capture, and a product that was still shipping features and being rebranded weeks befor...

Read full post

Mininglamp • May 19

55 pages in 4 days with a single agent is solid, but the more interesting question is what happens when you need multiple agents collaborating on docs — one handling API references, another doing tutorials, a third generating screenshots. The coordination overhead between agents becomes the real bottleneck, not generation speed. We're moving from "agent as tool" to "agent teams as workforce."

Debbie O'Brien • May 19

love this...

Mykola Kondratiuk • May 17

screenshot sync is the real bottleneck when the product's still shipping. I've had the same velocity gains on writing but stale screenshots erode doc trust fast. curious how the mid-project rebranding affected which ones survived

Thomas Landgraf • May 14

The skill-as-markdown observation lands. We've been building toward the same conclusion from the spec-management side, and I think it's the same mechanical insight applied to a different artifact. Your write-docs skill is a Markdown file that encodes the voice plus structure plus verification checklist for a page. Strip "page" and substitute "requirement" and you get the same shape: a Markdown file per testable unit with frontmatter (status, ID, acceptance criteria) that the agent loads the way yours loads the write-docs skill. The "update once, every future session benefits" property comes from the same place in both: the artifact IS the instruction.

Where the two diverge for me is the regeneration mode. Your screenshot-recapture loop is idempotent: UI changes, agent reruns capture, the doc is correct again. For specs you don't actually want idempotent regeneration. If the implementation moved and a previously-reviewed requirement now diverges, silently rewriting the spec to match the code is the silent-drift failure mode (someone reviewed the OLD intent for a reason). What you want there is for the agent to surface the divergence as a Change Request rather than overwrite the source. Full disclosure: I'm building a VS Code extension called SPECLAN around this. Same Markdown + YAML structure you're using for skills, plus a status lifecycle (draft -> review -> approved) that turns regeneration into a gated operation instead of a freebie.

The question I keep getting stuck on, and I'd love to hear how you handle it: when the UI changes but nobody has run the recapture yet, are the docs "wrong" or just "stale"? Where in the workflow is that divergence visible to someone using the docs, and does your agent ever flag it on the next run instead of silently regenerating?

Debbie O'Brien • May 14

its not an easy one indeed. The UI is changing by the day with releases happening weekly so for now am running it once the production release is out. I am planning to create a workflow for it so that it isnt such a manual task but thats still a work in progress. but yes the docs need updating and the screens need updating but somtimes just the screens and sometimes just the docs as new features added. devs are not writing docs. so this could be a whole workflow too. at the moment product hasnt launched so we still have time to tidy these things up but keeping docs uptodate in an era of moving so fast is for sure a challenge on its own

Thomas Landgraf • May 15

The screens-vs-docs split you mentioned is the part I keep coming back to, because they're actually two different failure modes wearing the same coat. A stale screenshot is mechanical: the feature still does what the doc says, the pixels just moved, and a recapture pass fixes it without anyone re-reading the prose. A stale doc is semantic: a feature shipped and nothing describes it, or the behavior changed and the words are now wrong. Recapture does nothing for that one. You need something that knows a feature exists with no doc pointing at it.

The reason I think separating them matters is that the mechanical half is automatable on a schedule (your post-release recapture run is exactly the right shape for it), but the semantic half needs a diff against something other than the UI. Feature flags flipping, new routes appearing, changelog entries with no matching doc page. That's the signal that catches "devs are not writing docs" before launch instead of after.

The whole-workflow thing you're gesturing at is real and I don't think anyone has nailed it yet. The closest I've gotten is making the gap visible rather than trying to auto-close it. An agent that silently regenerates will confidently document the wrong thing. One that says "these three routes have no doc page, here is my best guess, review before merge" keeps the human in the one spot where they actually add value. Curious whether your post-release run will try to close the doc gaps itself or just flag them for you.

Debbie O'Brien • May 15

yeh gonna be interesting to see how it evolves and how it can be improved. for now the human overseeing things and pulling in checks asking the questions on if everything has been documented etc. but perhaps there is another skill there in itself different from the generating script but maintaining docs skill and that is for sure worth looking at cause apps cause the keeping up with docs is a huge problem for sure. I myself have lived it

Thomas Landgraf • May 16

The separate maintain skill is the right instinct, and I think the reason it has to be its own thing is that it optimizes for a different goal. A generate skill is rewarded for completeness: write everything, cover the surface. A maintain skill is rewarded for catching absence: find the feature that shipped with nothing pointing at it. Those are almost opposite objective functions, so bolting maintenance onto the generator tends to make it regenerate confidently instead of noticing the gap.

The part I still haven't cracked is what you feed the maintain skill. Pointing it at the running app just gets you the generator again. The only thing that's worked for me is diffing between releases: what routes or flags or changelog lines are new since the last doc pass, and treating anything with no matching page as a flag rather than something to auto-write.

Having lived it is the right teacher for this though. The people who build the maintain skill well always seem to be the ones who got burned by stale docs first.

Andrii Krugliak • May 20

The recapturable-screenshots part is the actual unlock here, not the raw speed. I ran an agent across four docs sites last quarter and screenshots going stale two releases later cost me more time than the writing ever did. Pinning capture to source state means the docs rot at the same rate as the code instead of faster.

Theo Valmis • May 20

The part that usually gets skipped in writeups like this is what happened when the agent was wrong — and it sounds like you actually ran into that with the rebrand and features still shipping during the process. That's the real story of agent-assisted documentation: not the 4 days, but understanding where the agent's confidence was miscalibrated and where you had to step in.

The drift problem is brutal for docs specifically. Code drifts from docs over time in most projects, but when AI generates the docs from source code and then the product keeps shipping, you have drift accumulating from day one. Curious what your catch mechanism looked like — was it manual review passes, or did you build something into the agent loop to flag when UI state didn't match what it expected?

AudioProducer.ai • May 20

The screenshot-manifest-as-database move translated cleanly into our world too. On our pipeline (AudioProducer.ai) the rendered audio drama is the screenshot, the editable artifact is the manuscript plus per-character voice assignments plus per-paragraph sound annotations, and when a chapter changes we re-resolve from the manifest instead of regenerating audio from scratch. Thomas Landgraf's generate-vs-maintain split lands hard from the audio side: per-line emotion tags can be mechanically re-resolved on new lines (no semantic decision), but per-character voice and per-scene soundscape choices are semantic, and silently re-running auto-assign on a changed chapter is the exact "confidently document the wrong thing" failure mode you and Thomas were circling. Our editor surfaces unresolved assignments on new content as a review queue rather than auto-resolving them, same shape as "these three routes have no doc page, here is my best guess, review before merge."

Sam H • May 14

how long have your system been in use? sometimes, the maintenance history gives me some ideas about the complicity.

Debbie O'Brien • May 14

the system as in the docs skills i wrote is just a week old. the app is months ? not too sure to be honest but in the last few months the acceleration has been through the roof with so many new features, fixes and design and DX improvements