The Documentation Intern That Never Sleeps

Harsh Chandgotia — Fri, 12 Jun 2026 12:41:02 +0000

When I joined QAPilot, I noticed something interesting.

Some of the most experienced people on the team were spending hours every sprint on work that was important, but highly repetitive: tracking engineering changes ticket by ticket, and updating our GitBook pages to keep the user-facing documentation in sync.

That meant reading through closed Jira tickets, figuring out which doc pages were affected, rewriting those pages, and drafting customer-facing release notes, every single sprint. The information needed for all of this already existed across Jira, GitLab, and GitBook. It just needed to be gathered, connected, and acted on.

The more I looked at it, the more it felt like a workflow orchestration problem rather than an expertise problem. So I built an AI-powered pipeline to handle documentation impact analysis and regeneration, orchestrated through GitHub Actions, and designed around human review rather than blind automation,

The Shape of the Pipeline

Before getting into how each piece works, it's worth laying out the shape of the whole system, because everything below is really just a closer look at one part of this.

First, the pipeline gathers everything relevant to the sprint, tickets, code changes, screenshots, and the current state of the docs, into a knowledge base. Second, it works out what that knowledge base actually means for the documentation: which pages are affected, and why. A person reviews that before anything gets written. Third, and only after that review, it regenerates the affected pages and drafts release notes, which go through one more round of review before anything is published.

The same pattern repeats at every stage: plan first, act second, and put a person between the two.

Step 1: Building the Documentation Knowledge Base

Before the system can decide what's out of date, it needs to know two things: what changed, and what the docs currently say. So the pipeline starts each run by assembling a knowledge base for the sprint, drawn from four sources, each answering a different question.

From Jira, it pulls the sprint's tickets, what was supposed to change, in the team's own words. From GitLab, it optionally pulls the merge requests and commit diffs behind those tickets, what was actually built, which doesn't always match what was planned. From the tickets' attachments, it pulls screenshots and runs them through a vision-capable model to generate structured descriptions of what the feature actually looks like, which text alone often doesn't capture. And from GitBook, it pulls the entire existing documentation space, what's already written, so the system has something to compare against.

That last one turned out to be more involved than it sounds. GitBook doesn't store its content as markdown, it stores it as a proprietary JSON node tree, essentially a deeply nested structure of typed blocks (headings, paragraphs, lists, code blocks, images, links) that its editor uses internally. To remove unnecessary noise, I built a recursive converter that walks the tree and reconstructs it as clean markdown, preserving structure like nested lists and embedded images along the way.

It's also worth mentioning how the pipeline is able to access all these systems in the first place.

Our GitLab instance is self-hosted behind the company VPN, which means it isn't reachable from the public internet. GitHub-hosted runners execute in GitHub's infrastructure, so they have no network path to internal services such as GitLab. As a result, any workflow that needed to fetch merge requests, commit diffs, or repository metadata would simply fail because those systems were inaccessible from the runner.

To solve this, the entire workflow runs on a self-hosted EC2 runner deployed within the company's internal network. GitHub allows external machines to register themselves as self-hosted runners by installing the GitHub Actions runner agent and linking it to a repository or organization. Once registered, the EC2 instance appears as an available runner inside GitHub Actions and can receive workflow jobs just like GitHub-hosted runners.

Because the runner operates inside the same trusted environment as GitLab, and other internal services, it can securely communicate with them without requiring additional exposure to the public internet.

Step 2: The Mapping Layer

With the knowledge base in place, here's the part of the pipeline that does the real thinking. The most interesting part of this system isn't writing documentation, it's figuring out what needs to change in the first place.

Before any page gets rewritten, the pipeline runs an impact analysis. For every ticket in the sprint, it asks the model to reason through a few questions: which product feature did this change touch? Is the change visible to users, or purely internal? Which existing documentation pages describe that feature? And given that, should one of those pages be updated, or does this need a brand-new page?

Take a hypothetical example: a ticket adds a two-factor authentication step to the password reset flow. The model recognizes this as touching account security and being user-facing, finds that the "Resetting Your Password" page already describes the old flow and needs updating, and flags that a new "Setting Up Two-Factor Authentication" page might be needed if one doesn't already exist.

The output of this stage isn't documentation, it's a structured map: this ticket affects these pages, for these reasons. Separating this from generation, as its own explicit stage, made a bigger difference to output quality than any prompt tweak I tried. It gives the system a plan to inspect before it writes anything, and it gives reviewers something concrete to check: a proposed relationship between a change and a page, with reasoning attached, rather than a wall of regenerated text to proofread.

Human Review Before Generation

Once the mapping is ready, the pipeline opens a GitHub Issue listing every proposed ticket-to-page relationship, along with the model's reasoning for each. A reviewer, usually the PM who ran the sprint, reads through it. Most relationships are correct as-is. When one isn't, the reviewer doesn't need a special interface: they leave a comment with a small JSON snippet describing the correction.

The pipeline picks this up on its next run and folds it into the approved mapping.

No database. No custom review portal. No separate workflow engine. GitHub Issues became the system of record for the entire mapping step, which sounds almost too simple, but it meant reviewers were working in a tool they already used every day, and every decision and correction was automatically logged and auditable.

Step 3: Controlled Document Regeneration

With the mapping approved, the second workflow runs, and this is where the actual writing happens. For each page flagged as needing an update, the system pulls the current markdown and asks the model to revise only the sections relevant to the change, with explicit instructions to leave everything else untouched. This matters for a few reasons: it keeps the diff small and reviewable, it stops the model from quietly rewriting an unrelated paragraph in a slightly different voice, and it means a reviewer's job is "does this new section make sense" rather than "re-read the whole page for unintended changes."

For pages that don't exist yet, like our hypothetical "Setting Up Two-Factor Authentication" page, the model writes from scratch, but it's given a handful of existing pages from the same section as style references, so the new page reads like it belongs in the same documentation set rather than something a different author wrote.

Alongside the updated pages, the workflow also drafts customer-facing release notes for statuspage. These are deliberately a separate output from the documentation updates, because the audience is different: docs explain how a feature works in full, while release notes are a short, plain-language summary of what changed for someone using the product. Both the updated pages and the release notes are posted back to GitHub for one final round of review before anything goes live.

Keeping It Fast Enough to Run Every Sprint

One more piece is worth mentioning, because it's what makes running this every sprint practical rather than painful.

The mapping stage in Step 2 doesn't hand the model the full markdown of every GitBook page, for a documentation site of any real size, that would be an enormous amount of context. Instead, each page gets summarized first, and those summaries are what is fed into the mapping step. But summarizing the entire documentation space on every single run was expensive, in both time and tokens, for pages that hadn't changed at all since the last sprint.

The fix was a caching layer: GitBook automatically syncs its documentation content to a GitHub repository, allowing the pipeline to use repository SHAs as a lightweight change detection mechanism. Page summaries are persisted between runs as GitHub Actions artifacts, and each new run compares the latest repository state against the previous one to identify which pages have actually changed. Only those pages are re-summarized, while unchanged summaries are loaded directly from the cache. It's a relatively small architectural addition, but it's the difference between a pipeline that's practical to run every week and one that gradually becomes too expensive and slow to justify.

Engineering Lessons

The biggest lesson was that integration work is often harder than intelligence work. The LLM prompts were only one part of the system, most of the complexity came from stitching together Jira, GitLab, GitBook, GitHub Actions, VPN-restricted infrastructure, and multiple data formats into something reliable.

I also learned that building effective AI systems is less about finding the perfect prompt and more about designing the right architecture around the model. Planning stages, review gates, validation layers, and structured outputs had a far greater impact on quality than prompt tweaks ever did.

DEV Community: Harsh Chandgotia