jedrzejdocs

Posted on Dec 18, 2025

How to Document AI Agents (Because Traditional Docs Won't Cut It)

#ai #documentation #opensource #devops

AI agents are everywhere. Browser automation, coding assistants, customer service bots. The tooling is maturing fast.

The documentation? Not so much.

Most AI agent projects copy-paste the same README template used for deterministic software. That's a problem. Agents don't behave like traditional software. They're non-deterministic. They fail in unexpected ways. They make decisions.

Your docs need to reflect that.

Why Traditional Documentation Fails for AI Agents

Standard technical docs assume predictable behavior. Input X produces output Y. Every time.

AI agents don't work that way. The same prompt can produce different results. External factors (model temperature, context window, API rate limits) affect behavior. Failures aren't always reproducible.

This means your documentation needs to cover:

What the agent is supposed to do (not just what it can do)
How it makes decisions (the logic, not just the API)
When it fails (and what "failure" even means for a non-deterministic system)
How to debug it (because "it didn't work" isn't actionable)

The AI Agent Documentation Framework

Here's what you need to document. I'll use Notte as a reference point—a browser automation agent framework from YC S25 with 1.7k GitHub stars and 41k+ PyPI downloads as of December 2025.

1. Agent Purpose and Boundaries

What does the agent do? More importantly, what does it NOT do?

Traditional docs: "Notte automates web tasks."

Better docs: "Notte executes browser-based workflows using LLM reasoning. It handles dynamic page interactions, form filling, and data extraction. It does NOT handle JavaScript-heavy SPAs without explicit wait conditions, CAPTCHA solving without the stealth module enabled, or multi-tab workflows in the current version."

Be explicit about limitations. Users will find them anyway. Save them the frustration.

2. Decision Logic

How does the agent decide what to do next?

For AI agents, this is critical. Document:

What inputs affect decisions (prompts, context, tools available)
How the agent prioritizes actions
What triggers fallback behavior

Notte uses a "perception layer" that converts web pages into structured maps for LLM consumption. That's a design decision users need to understand. It explains why some pages work better than others.

3. Failure Modes

This is where most agent docs fail completely.

Don't just document error codes. Document failure patterns:

Failure Type	Symptom	Likely Cause	Recovery
Silent failure	Agent completes but wrong result	Ambiguous task description	Add specificity to prompt
Timeout	Agent loops indefinitely	Page state doesn't match expectations	Add explicit wait conditions
Partial completion	Some steps work, then stop	Context window exceeded	Break into smaller tasks

Users don't need to know every possible error. They need to know how to diagnose and fix common problems.

4. Observability

How do users know what the agent is doing?

Document:

Logging levels and what each captures
How to enable debug/verbose mode
Where to find execution traces
How to replay failed runs

Notte provides execution logs and session replay. Document how to use them. A user debugging a failed workflow needs to see exactly what the agent "saw" and what decisions it made.

5. Deterministic vs. Non-Deterministic Behavior

Be honest about what's predictable and what isn't.

Some parts of an agent system are deterministic:

Configuration parsing
API authentication
Tool availability checks

Some parts aren't:

LLM responses
Timing of page interactions
Order of operations in parallel tasks

Document which is which. Users building production systems need to know where to add retries, validation, and human-in-the-loop checkpoints.

6. Integration Patterns

How does this agent fit into a larger system?

Document:

Hybrid workflows (combining scripted and AI-driven steps)
Handoff patterns (when to use human oversight)
Idempotency (can you safely retry failed runs?)
State management (what persists between runs?)

Notte explicitly supports hybrid workflows—scripting deterministic parts and using AI only where needed. That's a documentation opportunity. Show users the pattern, not just the API.

The Minimum Viable Agent README

If you're documenting an AI agent, start here:

## What This Agent Does
[One paragraph. Be specific about capabilities AND limitations.]

## Quick Start
[Working example. Not "hello world"—a realistic use case.]

## How It Works
[Decision logic. What inputs matter. What triggers what.]

## When Things Go Wrong
[Common failure patterns. Symptoms. Fixes.]

## Debugging
[How to see what the agent is doing. Logs. Traces. Replay.]

## Known Limitations
[Be honest. List what doesn't work or isn't supported yet.]

The Real Problem

Most AI agent documentation is written by people who built the agent. They know how it works. They skip the parts that seem obvious.

But users don't know:

Why the agent made that decision
What "success" looks like for this task
How to tell if something went wrong silently

Document the thinking, not just the API.

AI agents are a new category. The documentation practices haven't caught up yet. If you're building agents, this is your chance to set the standard.

Write docs that assume non-determinism. Document failures as carefully as features. Show the decision logic, not just the endpoints.

Your future users (and your future self debugging at 2am) will thank you.

Building something with AI agents? I write technical documentation for developer tools. DM me on LinkedIn or check my work on GitHub.

DEV Community