When my company integrated an AI assistant into our documentation platform, I was part of one of the early teams selected to test it. The goal was to see how well the AI could answer real user questions by using our existing documentation as the source of truth.
What I did not expect was how much this exercise would change the way I think about technical writing.
This was not just about AI accuracy. It was about how documentation itself behaves when consumed by large language models (LLMs).
This article shares what that testing exposed and what it means for technical writers today.
Testing approach: real questions, real results
Testing started with questions that real Google Cloud users were already asking. Real support tickets, common search queries, and production environment questions were pulled directly using the QnA Analyzer tool.
For each question, the team compared three outputs:
• The AI assistant answer.
• The documentation pages it retrieved.
• The response that a subject matter expert or support engineer would give.
The team tracked several quality signals, including answer correctness, follow-up prompts required, product information mixing, and missing prerequisites.
When the AI assistant produced a weak or confusing answer, the first question was always whether the documentation was the cause, not the model. This framing changed the entire analysis.
The first surprise: the same question, different answers
During testing, the assistant sometimes gave different answers to the same question. This happened even when nothing changed: the question was the same, the documentation was the same, and the same AI model was running. At first, this looked like a reliability problem.
On closer inspection, the behavior revealed something fundamental about how Large Language Models (LLMs) work. LLMs are probabilistic, not deterministic. Unlike a calculator that always returns the same result, an LLM generates answers based on token probability and semantic similarity search, meaning it weighs available content and produces the most likely answer each time. Asking the same question twice is not guaranteed to produce identical output.
The key factor is the quality of the content the AI model retrieves. When the content is clear and specific, the model has less room to vary and answers stay consistent. When the content is vague or incomplete, the model fills in the gaps, and those gaps introduce the variation that looks like unreliability.
Minimalist writing works for humans but not always for AI
Minimalist documentation (short pages, fewer words, fewer explanations) works well for experienced human readers who fill in context from prior knowledge. LLMs cannot do this. They need intent and scope to be written down.
A minimalist page that opens directly with a task list gives the model almost nothing to use when deciding whether that page is relevant to a given query. Adding a short description of one to three sentences, covering the product, the scenario, and the user goal, gives the model a reliable signal to retrieve the right page. Without that context, the model guesses and produces unreliable responses.
The principle is straightforward: when documentation leaves less to interpretation, the model varies less in its answers.
Multiple cross-references can hurt AI retrieval
Too many cross-references created a different kind of documentation problem. When pages are heavily linked to other topics, the AI assistant did not always stay focused on the main page. Instead, it sometimes pulled steps from linked pages, combined partial instructions from multiple locations, and lost the workflow context that the user actually needed. The result was answers that were fragmented and hard to follow.
To address this issue, the approach was adjusted to treat cross-references as a last resort rather than a default. Primary workflows were kept self-contained so the model could find everything it needed on a single page. Circular linking patterns, where page A links to page B which links back to page A, were removed because they created retrieval loops with no clear resolution.
A real example: "How do I back up my instance?"
Users rarely specify which Google Cloud product or service they are using when they ask a question. The documentation needs to carry that context so the AI assistant can match the question to the right product. The question "How do I back up my instance?" is a good example of why this matters.
Multiple Google Cloud services use the word "instance" to mean different things. Several of these services use the concept of an “instance” but refer to different resources. Each service had its own backup process, its own console, and its own steps. However, the documentation pages for each service opened directly with steps and UI labels, with no introduction stating which service the page belonged to.
When the AI assistant received the question from users, it retrieved pages from multiple services and combined their steps into a single answer. The response looked complete and confident. It was not. A user following those steps would be mixing actions from different products and different consoles, which would either fail or cause unintended changes.
The root cause was not the model hallucination. The documentation pages looked identical to the model because they all used the same term and none of them declared their scope upfront.
The fix was straightforward. Each page introduction was updated to open with a clear statement, for example: "This topic describes how to back up an instance in Google Cloud Databases." That one sentence gave the model the signal it needed to retrieve the correct page for the correct service. After the update, the assistant stopped blending workflows and returned accurate, product-specific answers.
The experiment results
The team ran a targeted experiment on a selected set of pages. The changes included adding concise descriptions at the beginning of each page, introducing short summaries for topics and subtopics, enhancing step-level descriptions for clarity, and reducing unnecessary cross references to keep key workflows self-contained.
The results were measurable. AI answers became more accurate, search relevance improved, mixed-product responses decreased, and trust in AI-generated answers increased.
Practical guidance
Apply these practices to make documentation more effective for AI retrieval:
Add a concise description (one to three sentences) to every topic that states the product, scope, and scenario.
State the product scope early in the page, particularly when similar products share terminology.
Write preconditions and constraints explicitly rather than assuming the reader knows the context.
Disambiguate common terms that appear across products by adding a brief clarifying phrase.
Keep key workflows self-contained and limit cross-references to cases where they are necessary.
Treat topic introductions as retrieval anchors that help the model decide when a page is relevant.
Final thought
LLMs don't replace technical writers. They amplify the quality of the documentation we create. When documentation is clear, scoped, and intentional, AI becomes a powerful assistant. When it isn't, AI simply reflects the confusion already present in the content.
Top comments (0)