Here's a problem that sounds trivial until you've felt it at scale: you publish a documentation page, an agent cites it, someone bookmarks it — and then the author renames the file or moves it into a different folder. The content is the same. The link is dead.
In a docs-as-code world this happens constantly. Files get reorganized, folders get restructured, a getting-started.md becomes onboarding/intro.md. Every one of those moves quietly breaks links, citations, and any agent answer that pointed at the old path. If your link is just "the file path," then the link is only as stable as the file path — which is to say, not stable at all.
We needed links that stay valid even when content moves in source control. Here's how we built them.
The core idea: identity that isn't the path
A path is a location, not an identity. The fix is to give every doc a stable identity that travels with the content, and to build the public link on that identity instead of the path.
So a deeplink looks like this:
https://{host}/cid/{contentId}/fid/{fileId}
No folder structure. No filename. No extension. Just two stable identifiers: which content source the doc came from, and a fileId that uniquely and durably names the document itself. Move the file anywhere you like — the link still resolves, because the link was never about where the file lived.
The whole trick is making fileId stable across moves. That's where git earns its keep.
Deriving a stable file id from git
The naive version is easy: hash the file path.
fileId = sha256(repositoryFilePath);
That gives you a clean, deterministic id — but it has the exact flaw we're trying to avoid: move the file and the hash changes, so you get a new id and a new link. Useless.
The real work is detecting when a "new" path is actually an existing document that simply moved, and carrying its id forward. Git already knows this — every commit diff records renames as a source → target pair. So instead of hashing paths in isolation, we walk the commit history and let the diffs tell us what moved.
Walking from one commit to the next, for every change we ask: did this file exist before under a different path?
const changes = commitDiff.changes || [];
for (const change of changes) {
const previousPath = change.sourceServerItem; // where it used to live
const currentPath = change.item?.path; // where it lives now
// If we already had an id for the old path, carry it forward.
// Otherwise, mint a fresh one from the path hash.
const fileId = previousPathToId.has(previousPath)
? previousPathToId.get(previousPath)
: sha256(previousPath);
if (fileId && currentPath) {
pathToId.set(currentPath, fileId);
}
}
The key line is the carry-forward: when a file moves, its new path inherits the old path's id. The document keeps its identity through the move, and therefore keeps its link.
Files that didn't change in this commit simply keep whatever id they already had:
for (const item of allFilesAtThisCommit) {
if (!pathToId.has(item.path)) {
pathToId.set(item.path, previousPathToId.get(item.path) ?? sha256(item.path));
}
}
Do this commit-by-commit across the history and you end up with a path → fileId map where the id is anchored to the document's lineage, not its current location.
Handling the awkward cases
Two real-world wrinkles are worth calling out, because they're where naive implementations fall over:
Collisions. Two different documents can, in edge cases, resolve to the same hash-derived id. We detect duplicates and disambiguate the colliding entries by prefixing the commit id, so two distinct docs never share a link:
if (duplicateFileIds.has(fileId) && sha256(repoFilePath) === fileId) {
pathToId.set(repoFilePath, targetCommitId + fileId);
}
Cost. Walking commit diffs across a large repository is not free — it's the same lesson from selective fetching, where commit data, not file content, is the expensive part. So this derivation rides on cached, immutable commit metadata and only processes the diff between the last processed commit and the current one. You pay for history once, then move incrementally.
Closing the loop: embedding the link where it's used
A stable id is only useful if it actually reaches the consumer. When we publish a doc into the knowledge store that backs our agents, we stamp the deeplink directly into the content and alongside it as metadata:
const deeplink = `https://${host}/cid/${contentId}/fid/${fileId}`;
fileContents = `${fileContents}\n\ndocument_link:${deeplink}`;
filePathToDeepLink[blobPath] = deeplink;
We also write a small sidecar record per document — {fileId}.metadata.json — carrying who last touched it and when:
const sidecar = {
metadataSchemaVersion: '1',
lastUpdatedBy: change.sourceLastUpdatedBy ?? '',
lastUpdatedOn: change.sourceLastUpdatedTimestamp ?? '',
};
Now when an agent answers a question from one of these docs, it can cite a link that (a) points at the live page regardless of where the file currently sits in source control, and (b) comes with provenance — last editor, last edit time — so the answer can be trusted and traced.
Why it matters
The shift is small but the payoff is large: stop treating the file path as the link. Give each document a durable identity derived from its git lineage, build the public link on that identity, and embed it where your consumers — humans and agents alike — actually pick it up.
Content can move freely in source control. Authors can reorganize without a second thought. And every link, citation, and bookmark keeps pointing at the right page — because it was never pointing at a path in the first place.
This is part 3 of the **Docs-as-code at scale* series:*
- Stop cloning entire repos for your doc builds
- The expensive part of selective doc fetching isn't the files — it's the commits
- Links that don't break when your docs move (you are here)
Sai Pramod Upadhyayula is a Senior Software Engineer at Microsoft working on AI-powered enterprise knowledge platforms, and a contributor to the DocFX open-source ecosystem.
Top comments (0)