Originally published on Entropic Drift
From Two Files to Comprehensive Documentation
ripgrep is one of the most popular command-line search tools, with over 57,000 stars on GitHub. Despite its popularity and rich feature set, the documentation consisted of just two files: a feature-focused README and a tutorial-style GUIDE. While these files were well-written, they left gaps:
- No structured learning path from basics to advanced features
- Limited visual aids to explain complex concepts
- Features scattered across different sections
- No dedicated troubleshooting or reference sections
Using an evolved version of my documentation automation workflow (originally built for Prodigy's documentation), I generated 50+ pages of enhanced documentation, now live at https://iepathos.github.io/ripgrep.
This post documents how the workflow evolved to handle:
- Intelligent page splitting - Breaking monolithic chapters into focused subpages
- Automated visual enhancement - Adding diagrams, admonitions, and annotations where they help
- Mermaid diagram validation - Ensuring all generated diagrams render correctly
What Changed: MkDocs vs mdBook
The original workflow used mdBook, a simple static site generator popular in the Rust ecosystem. MkDocs Material offers significantly more features:
| Feature | mdBook | MkDocs Material |
|---|---|---|
| Mermaid Diagrams | Plugin required | Native support |
| Admonitions | Limited | 12+ styled types |
| Code Annotations | No | Yes (numbered callouts) |
| Tabbed Content | No | Yes |
| Navigation Tabs | No | Yes |
| Search | Basic | Advanced with highlighting |
| Theme Customization | Limited | Extensive |
The Enhanced Workflow Architecture
The ripgrep workflow introduced critical improvements over previous iterations. The complete flow:
-
Setup Phase:
- Analyze codebase features
- Detect documentation gaps and create stub pages
- Analyze page sizes and structural complexity
- Automatically split oversized pages into focused subpages
- Auto-discover all markdown files (including newly split pages)
-
Map Phase (per-page):
- Analyze page for drift (subsection-aware)
- Fix detected drift with validation
- Enhance with visual features (diagrams, admonitions, annotations)
-
Reduce Phase (holistic):
- Validate with strict build (
mkdocs build --strict) - Check structure and feature consistency
- Validate Mermaid diagrams with official renderer
- Auto-fix any issues found
- Validate with strict build (
Full workflow YAML available on GitHub
Four Key Innovations
1. Intelligent Page Splitting
The breakthrough feature: automatic page splitting based on structural analysis.
The original GUIDE.md was over 1,400 lines—a monolithic wall of text. The workflow now analyzes page structure before processing and automatically splits oversized pages into focused subpages based on:
- Line count and section depth (>300 lines or >5 major sections triggers splitting)
- Topic cohesion within sections
- Natural split points at heading boundaries
For ripgrep, this transformed:
- GUIDE.md (1,400 lines) → basics/ directory with 8 focused pages
- Binary Data section (350 lines) → binary-data/ directory with 5 specialized pages
- Troubleshooting → troubleshooting/ directory with 6 topic-specific pages
Benefits:
- Context-aware agents: Each agent works on focused topics, not overwhelming monoliths
- Better enhancement: Visual features applied to coherent topics
- Improved navigation: Readers find specific information faster
- Maintainability: Smaller pages easier to update
2. Exhaustive Page Discovery
After page splitting creates new files, we ensure every page gets processed with direct filesystem scanning:
find $DOCS_DIR -name "*.md" -type f | jq -R -s 'split("\n") | map(select(length > 0)) | ...'
This is critical because:
- Comprehensive coverage: Every .md file processed, including newly split pages
- No manual tracking: Filesystem is source of truth
- Orphaned pages caught: Pages that exist but aren't linked are discovered
- Deterministic: Same files every time
3. Visual Enhancement Per Page
After drift is fixed, each page gets enhanced in the map phase (not reduce), meaning each agent has full context about what the page covers:
- Mermaid diagrams for architecture and workflow visualization
- Admonitions (warnings, tips, notes) for important callouts
- Code annotations with numbered inline explanations
- Tabbed content for alternative approaches
The AI attempts context-aware decisions: "Pattern matching is decision-heavy, add a decision tree. Installation varies by platform, use tabbed content."
4. Mermaid Diagram Validation with Mermaid-Sonar
When you generate dozens of Mermaid diagrams automatically, some will have syntax errors and many will be too complex to be readable. What started as a simple syntax validation step in the workflow evolved into a standalone tool: mermaid-sonar.
The evolution: Initially, the workflow used @mermaid-js/mermaid-cli to catch syntax errors before build. But I noticed a pattern: AI agents would create syntactically valid diagrams that were cognitively overwhelming. An 80-node decision tree for a binary choice. A complex flowchart for a 3-item list. Diagrams that rendered correctly but hurt comprehension.
Syntax validation alone wasn't enough. I needed complexity validation to guide AI toward readable diagrams.
Mermaid-sonar uses static code analysis and research-backed heuristics to measure diagram complexity and provide actionable feedback to AI agents:
- shell: "mermaid-sonar $DOCS_DIR --strict"
on_failure:
claude: "/prodigy-fix-mermaid-diagrams ${shell.output}"
What makes it effective for AI workflows:
- Static analysis: Parses diagram source code, calculates graph metrics (nodes, edges, density, branching)
- Research-backed thresholds: Based on cognitive load research (50/100 node limits), graph theory, and Mermaid performance characteristics
- Specific feedback: Instead of "too complex," provides "12 parallel branches (>8 max) → use LR layout or split"
- Fast execution: No rendering required, optimized for CI/CD loops
- Machine-readable output: JSON format allows AI agents to understand and fix issues automatically
The validation caught 8 broken diagrams in ripgrep's docs that would have failed during build, plus 12 overly complex diagrams that needed simplification.
Complexity issues detected:
- Diagrams with >50 nodes (high-density) or >100 nodes (low-density)
- Wide tree diagrams with >8 parallel branches
- Graphs exceeding 100 connections (Mermaid's O(n²) limit)
- High graph density (>0.3) indicating visual clutter
- Layout mismatches (TD layout for wide trees, LR for deep hierarchies)
The key insight: AI agents need quantitative feedback to understand "too complex." By converting subjective readability into measurable metrics with clear thresholds, mermaid-sonar guides AI toward diagrams that actually help readers instead of overwhelming them.
This is the difference between "catch errors" and "guide toward quality." Traditional validation tells you what's broken. Heuristic-based validation tells you what needs improvement and how to improve it—feedback AI agents can act on automatically.
Real Results: ripgrep Documentation Transformation
Before:
- 2 documentation files (README.md, GUIDE.md)
- ~1,800 total lines of markdown
- No visual diagrams
- Limited structure (linear reading path)
After:
- 55 focused documentation pages
- Organized into 7 major sections
- 87 Mermaid diagrams explaining workflows and decision trees
- 358 admonitions highlighting warnings and tips
- 180+ code annotations with inline explanations
- Tabbed installation instructions for different platforms
- 100% strict build success - no broken links or invalid syntax
Example: Pattern Matching Page Transformation
Before (from GUIDE.md): Plain text explaining regex vs literal patterns, scattered across multiple sections.
After (docs/basics/pattern-matching.md):
Added decision tree diagram helping users choose between literal and regex patterns, converted important notes to styled admonitions (tips for when to use -F flag), and annotated code examples with numbered callouts explaining each flag's purpose.
The page went from "walls of text" to "guided learning."
Key Insights
1. Page Splitting Unlocks Comprehensive Documentation
The most impactful innovation wasn't visual enhancements—it was automatic page splitting. No reader wants to scroll through 1,400 lines to find information about case sensitivity. The workflow handles splitting automatically by:
- Identifying oversized pages (>300 lines or >5 major sections)
- Determining logical split points (section boundaries)
- Creating focused subpages with proper hierarchy
- Updating cross-references and navigation
Documentation that expands organically from codebase analysis becomes navigable and learnable, not an overwhelming wall of text.
2. Visual Features Need Intelligence
You can't apply visual enhancements mechanically. A rule-based system would either over-apply (cluttering simple pages) or under-apply (missing opportunities). AI agents with context make nuanced decisions about which features help specific content.
3. Complexity Validation Catches AI Over-Engineering
Syntax validation alone isn't enough for AI-generated diagrams. AI agents tend to over-engineer visualizations, creating elaborate flowcharts for simple categorizations. Mermaid-sonar's complexity analysis catches these issues automatically by measuring:
- Cognitive load: Research shows 50-node limit for readable high-density graphs
- Visual hierarchy: Detects wide trees (>8 branches) that should use horizontal layout
- Performance limits: Flags graphs approaching Mermaid's O(n²) rendering threshold
Automated complexity validation → automated fixes → readable diagrams. Manual review still catches poor diagram type choices, but complexity issues are handled automatically.
4. Per-Page Enhancement Beats Bulk Processing
Running enhancement in the map phase (per-page) instead of reduce phase (bulk) means each agent has full context about what the page covers. Decisions are informed by content, not position. The extra parallelization complexity is worth it.
Comparing the Three Generations
| Aspect | Debtmap (Original) | Prodigy Book-Docs | ripgrep MkDocs |
|---|---|---|---|
| Granularity | Chapter-level only | Subsection-aware (H2, H3) | Subsection-aware |
| Page Structure | Fixed (manual SUMMARY.md) | Fixed (manual SUMMARY.md) | Dynamic (automatic splitting) |
| Page Discovery | Curated chapters.json | Gap detection → flattened-items.json | Gap detection + splitting + filesystem scan |
| Validation | None | Doc fix + holistic validation | Page + structure + consistency + Mermaid |
| Visual Enhancement | None | None | Diagrams, admonitions, annotations |
| Pages Generated | 27 (curated) | 47 (gap-detected) | 55 (split + discovered) |
| Diagrams | 0 | 0 | 87 Mermaid diagrams |
The evolution: accuracy → precision → transformation.
How to Adapt This Workflow
Want to build something similar? Here's the condensed path:
1. Start with MkDocs Material
Configure with features you want: admonitions, code annotations, Mermaid diagrams, tabbed content.
2. Build Auto-Discovery
Use find and jq to discover all markdown files as your source of truth.
3. Create Enhancement Logic
Write prompts that analyze page content and add:
- Mermaid diagrams for architecture/workflows
- Admonitions for warnings/tips/notes
- Code annotations for complex examples
- Tabs for alternative approaches
4. Add Multi-Layer Validation
- Per-page (map phase): Each page meets quality standards
-
Build validation (reduce):
mkdocs build --strict - Consistency (reduce): Pages use similar enhancements
-
Diagram validation: Verify Mermaid syntax with
mermaid-sonar docs/ --strict
5. Iterate and Refine
Run the workflow, review output, adjust. Because agents enhance existing documentation rather than replacing it, manual improvements become context for the next run. The documentation becomes iterative: agents enhance, you refine, agents use refinements as context.
Full implementation details in the GitHub workflow
Future Enhancements
The workflow is production-ready, but opportunities abound:
Screenshot Management: Detect when UI screenshots are outdated and regenerate using browser automation.
Interactive Examples: Generate runnable code examples with embedded outputs.
Version-Specific Documentation: Integrate with mike to maintain documentation for multiple versions automatically. When a new version releases, run drift workflow against the new codebase and generate version-specific docs with accurate examples.
Conclusion
By combining intelligent page splitting, auto-discovery, subsection-aware drift detection, per-page visual enhancement, multi-layer validation, and strict build enforcement, we've created a workflow that can help transform a project's minimal documentation into more comprehensive guides with AI assistance.
The ripgrep documentation demonstrates this transformation in production. Every page split, diagram, admonition, and annotation was generated automatically from the original README, GUIDE, and code. The workflow preserved the authors' clarity and intent while making it dramatically more accessible.
The docs aren't perfect. We've only begun to codify the rules and heuristics needed to guide AI to mimic a skilled technical writer. This is just an early run of this workflow on a third party open source library to explore the limits of AI documentation automation. At the end of this journey, we hope to have an open source solution that anyone can use to produce high quality docs for code bases.
The pattern is reusable. Point it at any project with well-written code, and watch it transform into comprehensive guides with visual aids and structured learning paths.
Resources
Live Documentation Examples:
- ripgrep Docs (transformed from 2 files): https://iepathos.github.io/ripgrep/
- Prodigy Docs (MkDocs Material): https://iepathos.github.io/prodigy/
- Debtmap Docs (mdBook): https://iepathos.github.io/debtmap/
Blog Posts:
- Original automation case study: Automating Documentation Maintenance with Prodigy
Workflows (showing evolution):
- Generation 1: Debtmap book-docs-drift.yml (chapter-level)
- Generation 2: Prodigy book-docs-drift.yml (subsection-aware)
- Generation 3: ripgrep mkdocs-drift.yml (page splitting + visual enhancements + diagram validation)
Tools:
- Prodigy: GitHub | Crates.io
- Debtmap: GitHub | Crates.io
- Mermaid-Sonar: GitHub | npm
- ripgrep: GitHub | Fork with MkDocs
- MkDocs Material: https://squidfunk.github.io/mkdocs-material/
This blog post documents the third generation of the documentation automation workflow. The first generation (Debtmap) proved the concept at the chapter level. The second generation (Prodigy book-docs) added subsection precision. This third generation (ripgrep mkdocs) adds intelligent page splitting, visual engagement, and diagram validation. The workflow transformed ripgrep's 2-file documentation into 55+ comprehensive pages—proving that AI automation can enhance existing open source projects without replacing their authors' expertise.
Want more content like this? Follow me on Dev.to or subscribe to Entropic Drift for posts on AI-powered development workflows, Rust tooling, and technical debt management.
Check out my open-source projects:
Top comments (0)