The first time AI generated documentation for my project, it looked perfect.
Clear structure. Confident tone. Professional language.
That was exactly the problem.
A week later, when I tried to review it, I realized I couldn’t answer a basic question:
Which parts of this document came from my requirements, and which parts did the AI make up?
Everything was written with equal confidence. There was no way to tell where I should trust the content—and where I needed to verify it.
Why This Is a Problem You Don’t Notice at First
When AI creates documentation, it doesn’t distinguish between:
- facts you explicitly provided
- information inferred from existing documents
- assumptions made to fill gaps
- general industry conventions
All of them look the same on the page.
At first, that feels convenient.
Later, it becomes dangerous.
Because when you revisit the document—or when someone else relies on it—you can no longer tell what is actually true versus what merely sounds reasonable.
The Problem After Import
In my previous article, I discussed how to safely import legacy documents using question-driven integration. That approach works well at the entry point.
But I ran into a new problem after the import.
Even with careful integration, AI-generated documents still mix different kinds of information without distinction.
Consider a typical API design section:
- The API uses REST architecture with JSON responses.
- Authentication requires Bearer tokens.
- Rate limiting is set to 100 requests per minute.
- Error responses follow RFC 7807 format.
Which of these came from my requirements?
Which did the AI infer?
Which are just defaults pulled from general knowledge?
I couldn’t tell.
And neither could the AI when it referenced this document later.
The Solution: Source Attribution
The fix was simple in concept, but powerful in practice:
Require AI to tag every statement with its source.
Each claim must declare where it came from.
| Tag | Meaning | Trust Level |
|---|---|---|
[explicit] |
Directly provided by the user | High — use as-is |
[inferred] |
Derived from existing documents | Medium — verify |
[assumed] |
Placeholder due to missing info | Low — needs input |
[general] |
Filled from general knowledge | Low — override if needed |
The same section rewritten:
[explicit] The API uses REST architecture with JSON responses.
[inferred] Authentication requires Bearer tokens.
└─ "All endpoints require authentication" (REQUIREMENTS.md L.23)
[assumed] Rate limiting is set to 100 requests per minute.
[general] Error responses follow RFC 7807 format.
Now the review effort is obvious.
I know exactly where to focus.
Why Inferences Must Include Citations
The [inferred] tag turned out to be the most dangerous one.
AI is very good at post-hoc rationalization.
It can reach a conclusion first, then search for text that sounds supportive.
So I added a rule:
Every inferred statement must include a verbatim quotation from its source.
Example:
[inferred] Retry policy allows 3 attempts
└─ "External API calls should retry up to 3 times" (API_DESIGN.md L.28)
The quote must appear exactly in the source.
If the quote doesn’t support the conclusion, the problem is immediately visible.
Without the quote, I’d have to hunt through documents myself.
With it, verification takes seconds.
Where Source Attribution Is Required
Not everything needs tags. The rule is simple:
Tag documents that others will rely on as truth.
| Document Type | Tags Required | Verification Method |
|---|---|---|
| Work logs | No | Point-in-time record |
| Design specs | Yes | Human review |
| README / Guides | Yes | Human review |
| Test specs | Yes | Cross-reference |
| Source code | No | Executable tests |
Source code already has a verification mechanism: tests.
Documentation doesn’t.
Source tags provide the missing verification metadata.
Code comments are excluded as well. Embedding design rationale in comments creates maintenance debt—comments rot silently when documents change.
The Terminology Drift Problem
Source attribution alone breaks down when terminology drifts.
# API_DESIGN.md
[explicit] Retry policy allows 3 attempts
# SERVICE_SPEC.md
[inferred] Re-execution strategy permits 3 tries
# TEST_SPEC.md
[inferred] Fault recovery mechanism uses 3 retries
Same concept. Three names.
Now grep fails.
Impact analysis fails.
Humans lose trust.
Conversation-Based Terminology Unification
The solution wasn’t a static glossary—that’s just another document to forget.
Instead, terminology is unified during conversation.
Human: "The re-execution count should increase to 5"
AI: "You mentioned 're-execution'. Is this the same as
'retry policy (#retry-policy)' in existing docs?"
Human: "Yes."
AI: "Understood. Updating retry policy to 5 attempts."
Drift is caught at the point of entry, not after the damage spreads.
The Multi-User Reality Check
This works well for single-user workflows.
It breaks with multiple users.
Different people, different sessions, different terms—parallel truths emerge.
This is the limit of conversational unification.
Solving it requires shared infrastructure: synchronized glossaries, versioned terminology, or serialized workflows.
That is a different class of problem.
The Pragmatic Boundary
Don’t retrofit this onto everything.
- New projects: AI involved from day one → tags and terminology stay clean
- Legacy systems: use question-driven integration, then enforce rules going forward
Draw a boundary.
New work follows the protocol.
Legacy stays untouched until it’s modified.
Why This Matters
Source attribution doesn’t make AI perfect.
It doesn’t prevent mistakes.
What it does is make mistakes visible.
When you can see where AI was certain versus where it guessed, you know where to apply human judgment. That visibility is the foundation of trust in AI-collaborative development.
This article is part of the Beyond Prompt Engineering series, exploring systematic—not accidental—ways to work with AI.
Top comments (0)