<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: sai pramod upadhyayula</title>
    <description>The latest articles on DEV Community by sai pramod upadhyayula (@saipramod).</description>
    <link>https://dev.to/saipramod</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3946841%2F35033b4a-c342-4ee5-b174-34759d637f23.jpg</url>
      <title>DEV Community: sai pramod upadhyayula</title>
      <link>https://dev.to/saipramod</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/saipramod"/>
    <language>en</language>
    <item>
      <title>Links that don't break when your docs move</title>
      <dc:creator>sai pramod upadhyayula</dc:creator>
      <pubDate>Wed, 10 Jun 2026 03:04:36 +0000</pubDate>
      <link>https://dev.to/saipramod/links-that-dont-break-when-your-docs-move-1h13</link>
      <guid>https://dev.to/saipramod/links-that-dont-break-when-your-docs-move-1h13</guid>
      <description>&lt;p&gt;Here's a problem that sounds trivial until you've felt it at scale: you publish a documentation page, an agent cites it, someone bookmarks it — and then the author renames the file or moves it into a different folder. The content is the same. The link is dead.&lt;/p&gt;

&lt;p&gt;In a docs-as-code world this happens constantly. Files get reorganized, folders get restructured, a &lt;code&gt;getting-started.md&lt;/code&gt; becomes &lt;code&gt;onboarding/intro.md&lt;/code&gt;. Every one of those moves quietly breaks links, citations, and any agent answer that pointed at the old path. If your link is just "the file path," then the link is only as stable as the file path — which is to say, not stable at all.&lt;/p&gt;

&lt;p&gt;We needed links that stay valid even when content moves in source control. Here's how we built them.&lt;/p&gt;

&lt;h2&gt;
  
  
  The core idea: identity that isn't the path
&lt;/h2&gt;

&lt;p&gt;A path is a &lt;em&gt;location&lt;/em&gt;, not an &lt;em&gt;identity&lt;/em&gt;. The fix is to give every doc a stable identity that travels with the content, and to build the public link on that identity instead of the path.&lt;/p&gt;

&lt;p&gt;So a deeplink looks like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;https://{host}/cid/{contentId}/fid/{fileId}
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;No folder structure. No filename. No extension. Just two stable identifiers: which content source the doc came from, and a &lt;code&gt;fileId&lt;/code&gt; that uniquely and durably names the document itself. Move the file anywhere you like — the link still resolves, because the link was never about where the file lived.&lt;/p&gt;

&lt;p&gt;The whole trick is making &lt;code&gt;fileId&lt;/code&gt; stable across moves. That's where git earns its keep.&lt;/p&gt;

&lt;h2&gt;
  
  
  Deriving a stable file id from git
&lt;/h2&gt;

&lt;p&gt;The naive version is easy: hash the file path.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="nx"&gt;fileId&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;sha256&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;repositoryFilePath&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That gives you a clean, deterministic id — but it has the exact flaw we're trying to avoid: move the file and the hash changes, so you get a &lt;em&gt;new&lt;/em&gt; id and a &lt;em&gt;new&lt;/em&gt; link. Useless.&lt;/p&gt;

&lt;p&gt;The real work is detecting when a "new" path is actually an &lt;em&gt;existing&lt;/em&gt; document that simply moved, and carrying its id forward. Git already knows this — every commit diff records renames as a &lt;code&gt;source → target&lt;/code&gt; pair. So instead of hashing paths in isolation, we walk the commit history and let the diffs tell us what moved.&lt;/p&gt;

&lt;p&gt;Walking from one commit to the next, for every change we ask: &lt;em&gt;did this file exist before under a different path?&lt;/em&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;changes&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;commitDiff&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;changes&lt;/span&gt; &lt;span class="o"&gt;||&lt;/span&gt; &lt;span class="p"&gt;[];&lt;/span&gt;
&lt;span class="k"&gt;for &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;change&lt;/span&gt; &lt;span class="k"&gt;of&lt;/span&gt; &lt;span class="nx"&gt;changes&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;previousPath&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;change&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;sourceServerItem&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="c1"&gt;// where it used to live&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;currentPath&lt;/span&gt;  &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;change&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;item&lt;/span&gt;&lt;span class="p"&gt;?.&lt;/span&gt;&lt;span class="nx"&gt;path&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;       &lt;span class="c1"&gt;// where it lives now&lt;/span&gt;

  &lt;span class="c1"&gt;// If we already had an id for the old path, carry it forward.&lt;/span&gt;
  &lt;span class="c1"&gt;// Otherwise, mint a fresh one from the path hash.&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;fileId&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;previousPathToId&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;has&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;previousPath&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;?&lt;/span&gt; &lt;span class="nx"&gt;previousPathToId&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;previousPath&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nf"&gt;sha256&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;previousPath&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

  &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;fileId&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; &lt;span class="nx"&gt;currentPath&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="nx"&gt;pathToId&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;set&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;currentPath&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;fileId&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The key line is the carry-forward: when a file moves, its &lt;strong&gt;new&lt;/strong&gt; path inherits the &lt;strong&gt;old&lt;/strong&gt; path's id. The document keeps its identity through the move, and therefore keeps its link.&lt;/p&gt;

&lt;p&gt;Files that didn't change in this commit simply keep whatever id they already had:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="k"&gt;for &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;item&lt;/span&gt; &lt;span class="k"&gt;of&lt;/span&gt; &lt;span class="nx"&gt;allFilesAtThisCommit&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;!&lt;/span&gt;&lt;span class="nx"&gt;pathToId&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;has&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;item&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;path&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="nx"&gt;pathToId&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;set&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;item&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;path&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;previousPathToId&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;item&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;path&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;??&lt;/span&gt; &lt;span class="nf"&gt;sha256&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;item&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;path&lt;/span&gt;&lt;span class="p"&gt;));&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Do this commit-by-commit across the history and you end up with a &lt;code&gt;path → fileId&lt;/code&gt; map where the id is anchored to the document's lineage, not its current location.&lt;/p&gt;

&lt;h2&gt;
  
  
  Handling the awkward cases
&lt;/h2&gt;

&lt;p&gt;Two real-world wrinkles are worth calling out, because they're where naive implementations fall over:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Collisions.&lt;/strong&gt; Two different documents can, in edge cases, resolve to the same hash-derived id. We detect duplicates and disambiguate the colliding entries by prefixing the commit id, so two distinct docs never share a link:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;duplicateFileIds&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;has&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;fileId&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; &lt;span class="nf"&gt;sha256&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;repoFilePath&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;===&lt;/span&gt; &lt;span class="nx"&gt;fileId&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nx"&gt;pathToId&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;set&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;repoFilePath&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;targetCommitId&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="nx"&gt;fileId&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Cost.&lt;/strong&gt; Walking commit diffs across a large repository is not free — it's the &lt;a href="https://dev.to/saipramod/the-expensive-part-isnt-the-files-its-the-commits"&gt;same lesson from selective fetching&lt;/a&gt;, where commit data, not file content, is the expensive part. So this derivation rides on cached, immutable commit metadata and only processes the diff between the last processed commit and the current one. You pay for history once, then move incrementally.&lt;/p&gt;

&lt;h2&gt;
  
  
  Closing the loop: embedding the link where it's used
&lt;/h2&gt;

&lt;p&gt;A stable id is only useful if it actually reaches the consumer. When we publish a doc into the knowledge store that backs our agents, we stamp the deeplink directly into the content and alongside it as metadata:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;deeplink&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;`https://&lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;host&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;/cid/&lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;contentId&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;/fid/&lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;fileId&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;`&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="nx"&gt;fileContents&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;`&lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;fileContents&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;\n\ndocument_link:&lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;deeplink&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;`&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="nx"&gt;filePathToDeepLink&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nx"&gt;blobPath&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;deeplink&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;We also write a small sidecar record per document — &lt;code&gt;{fileId}.metadata.json&lt;/code&gt; — carrying who last touched it and when:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;sidecar&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="na"&gt;metadataSchemaVersion&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;1&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="na"&gt;lastUpdatedBy&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;change&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;sourceLastUpdatedBy&lt;/span&gt; &lt;span class="o"&gt;??&lt;/span&gt; &lt;span class="dl"&gt;''&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="na"&gt;lastUpdatedOn&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;change&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;sourceLastUpdatedTimestamp&lt;/span&gt; &lt;span class="o"&gt;??&lt;/span&gt; &lt;span class="dl"&gt;''&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;};&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Now when an agent answers a question from one of these docs, it can cite a link that (a) points at the live page regardless of where the file currently sits in source control, and (b) comes with provenance — last editor, last edit time — so the answer can be trusted and traced.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why it matters
&lt;/h2&gt;

&lt;p&gt;The shift is small but the payoff is large: &lt;strong&gt;stop treating the file path as the link.&lt;/strong&gt; Give each document a durable identity derived from its git lineage, build the public link on that identity, and embed it where your consumers — humans and agents alike — actually pick it up.&lt;/p&gt;

&lt;p&gt;Content can move freely in source control. Authors can reorganize without a second thought. And every link, citation, and bookmark keeps pointing at the right page — because it was never pointing at a path in the first place.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;This is part 3 of the **Docs-as-code at scale&lt;/em&gt;* series:*&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;a href="https://dev.to/saipramod/stop-cloning-entire-repos-for-your-doc-builds-28i0"&gt;Stop cloning entire repos for your doc builds&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://dev.to/saipramod/the-expensive-part-of-selective-doc-fetching-isnt-the-files-its-the-commits-5dd2"&gt;The expensive part of selective doc fetching isn't the files — it's the commits&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Links that don't break when your docs move&lt;/strong&gt; &lt;em&gt;(you are here)&lt;/em&gt;
&lt;/li&gt;
&lt;/ol&gt;




&lt;p&gt;&lt;em&gt;Sai Pramod Upadhyayula is a Senior Software Engineer at Microsoft working on AI-powered enterprise knowledge platforms, and a contributor to the DocFX open-source ecosystem.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>git</category>
      <category>documentation</category>
      <category>ai</category>
      <category>architecture</category>
    </item>
    <item>
      <title>The expensive part of selective doc fetching isn't the files — it's the commits</title>
      <dc:creator>sai pramod upadhyayula</dc:creator>
      <pubDate>Wed, 10 Jun 2026 03:04:02 +0000</pubDate>
      <link>https://dev.to/saipramod/the-expensive-part-of-selective-doc-fetching-isnt-the-files-its-the-commits-5dd2</link>
      <guid>https://dev.to/saipramod/the-expensive-part-of-selective-doc-fetching-isnt-the-files-its-the-commits-5dd2</guid>
      <description>&lt;p&gt;In my last post I argued for &lt;a href="https://dev.to/saipramod/stop-cloning-entire-repos-for-your-doc-builds-28i0"&gt;resolving before you fetch&lt;/a&gt;: use the doc manifest to figure out which 50 files matter before you pull 200,000. That solves the obvious waste. But once we put it into production across dozens of large repos — working closely with the Azure DevOps and content-version provider teams — we found the &lt;em&gt;real&lt;/em&gt; cost was hiding somewhere I didn't expect.&lt;/p&gt;

&lt;p&gt;It wasn't the file content. It was the commit data.&lt;/p&gt;

&lt;h2&gt;
  
  
  The surprise: walking commit history is the bottleneck
&lt;/h2&gt;

&lt;p&gt;When you ask a git provider "what changed, and at what version?", you're not making one cheap call. Commit data spans levels — trees reference trees, commits reference parents, and a naive walk fans out into a deep recursive traversal. For a build that needs 50 markdown files, we were spending most of our time &lt;em&gt;not&lt;/em&gt; downloading those files, but resolving the commit graph around them.&lt;/p&gt;

&lt;p&gt;The content fetch was the easy part. The version resolution was the tax.&lt;/p&gt;

&lt;h2&gt;
  
  
  The fix: cache the metadata, not the content
&lt;/h2&gt;

&lt;p&gt;The instinct is to cache file content. We did the opposite.&lt;/p&gt;

&lt;p&gt;Commit data has one beautiful property: &lt;strong&gt;it's immutable.&lt;/strong&gt; A commit SHA points to exactly one tree, forever. So instead of caching the bytes of &lt;code&gt;intro.md&lt;/code&gt; (which change, and which we want fresh), we cache the &lt;em&gt;commit metadata&lt;/em&gt; — the tree structure, the file-to-blob mappings, the version graph.&lt;/p&gt;

&lt;p&gt;The pattern looks like this:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;One full recursive fetch, once.&lt;/strong&gt; Pay the deep-walk cost a single time per repo and store the resulting commit/tree metadata.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Build incrementally with diffs.&lt;/strong&gt; After that, never walk the full history again. Ask only for the commit diff since the last known SHA, and patch your cached metadata forward.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;An expensive recursive traversal becomes a cheap incremental update. And because the cached layer is immutable metadata, there's no invalidation headache — you're only ever &lt;em&gt;appending&lt;/em&gt; knowledge of new commits, never invalidating old ones.&lt;/p&gt;

&lt;h2&gt;
  
  
  The unexpected bonus: honest freshness
&lt;/h2&gt;

&lt;p&gt;This caching layer turned out to do more than save time. It made freshness &lt;em&gt;honest&lt;/em&gt;.&lt;/p&gt;

&lt;p&gt;If you're feeding docs into a RAG pipeline or an agent's knowledge base, the scariest failure mode is an answer sourced from a stale or irrelevant file. The commit diff is the cleanest possible signal for this: it tells you precisely which docs moved, were added, or were deleted — at the version level. You stop re-indexing on a blind schedule and start re-indexing on actual change. "Is this doc still current?" becomes a cheap lookup against &lt;code&gt;{repo, commitSha, path}&lt;/code&gt; instead of a re-fetch.&lt;/p&gt;

&lt;h2&gt;
  
  
  Where this bites differently: monorepo vs. many small repos
&lt;/h2&gt;

&lt;p&gt;The same problem shows up in opposite shapes depending on how your org structures things.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;In a monorepo&lt;/strong&gt;, the hard part is &lt;em&gt;precision&lt;/em&gt;. One commit graph, but docs are buried among 200k files. The commit diff is your friend — one cached tree plus a diff tells you exactly which of those files changed, and the manifest's &lt;code&gt;src&lt;/code&gt; scoping and &lt;code&gt;exclude&lt;/code&gt; patterns keep a greedy &lt;code&gt;**/*.md&lt;/code&gt; from sweeping in changelogs and archived folders. Freshness is comparatively easy because everything shares one version.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Across many repos under one org&lt;/strong&gt;, it's a &lt;em&gt;coordination and provenance&lt;/em&gt; problem. Fifty repos, fifty commit graphs, fifty manifests — some with no manifest at all. This is where the cached commit metadata earns its keep: you can answer "what changed across all of them?" by diffing each cheaply, instead of re-walking fifty histories on every build. A few things that helped us:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Trust the per-repo manifest&lt;/strong&gt; as the definition of "what is a doc here." Don't impose a global glob — that's how irrelevant files leak across repo boundaries.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Pin everything to &lt;code&gt;{repo, commitSha, path}&lt;/code&gt;&lt;/strong&gt; and re-resolve on a commit diff, not a timer. Staleness is almost always a re-index-cadence problem, not a fetch problem.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Treat no-manifest repos as out-of-scope&lt;/strong&gt;, explicitly. The "just index everything" fallback is the single biggest source of irrelevant answers downstream.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  The takeaway
&lt;/h2&gt;

&lt;p&gt;Selective fetching gets you from 200,000 files to 50. But the speed and the &lt;em&gt;trustworthiness&lt;/em&gt; of the whole pipeline come from a layer underneath it: caching the immutable commit metadata once, then riding commit diffs forever. Monorepos push you toward better filtering; many-repos push you toward better provenance. The commit-diff layer is what makes both fast — and what lets an agent know its docs are actually current.&lt;/p&gt;

&lt;p&gt;That same cached commit history powers something else, too: links to your docs that don't break when files get moved or renamed. That's the subject of the &lt;a href="https://dev.to/saipramod/links-that-dont-break-when-your-docs-move"&gt;next post in this series&lt;/a&gt;.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;This is part 2 of the **Docs-as-code at scale&lt;/em&gt;* series:*&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;a href="https://dev.to/saipramod/stop-cloning-entire-repos-for-your-doc-builds-28i0"&gt;Stop cloning entire repos for your doc builds&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;The expensive part of selective doc fetching isn't the files — it's the commits&lt;/strong&gt; &lt;em&gt;(you are here)&lt;/em&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href="https://dev.to/saipramod/links-that-dont-break-when-your-docs-move-1h13"&gt;Links that don't break when your docs move&lt;/a&gt;&lt;/li&gt;
&lt;/ol&gt;




&lt;p&gt;&lt;em&gt;Sai Pramod Upadhyayula is a Senior Software Engineer at Microsoft working on AI-powered enterprise knowledge platforms, and a contributor to the DocFX open-source ecosystem.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>git</category>
      <category>documentation</category>
      <category>opensource</category>
    </item>
    <item>
      <title>docfx-remote-include: inline remote markdown into DocFX, and govern the assembled page</title>
      <dc:creator>sai pramod upadhyayula</dc:creator>
      <pubDate>Tue, 09 Jun 2026 23:19:27 +0000</pubDate>
      <link>https://dev.to/saipramod/docfx-remote-include-inline-remote-markdown-into-docfx-and-govern-the-assembled-page-49hc</link>
      <guid>https://dev.to/saipramod/docfx-remote-include-inline-remote-markdown-into-docfx-and-govern-the-assembled-page-49hc</guid>
      <description>&lt;p&gt;&lt;a href="https://github.com/saipramod/docfx-remote-include" rel="noopener noreferrer"&gt;docfx-remote-include&lt;/a&gt; is a &lt;a href="https://github.com/xoofx/markdig" rel="noopener noreferrer"&gt;Markdig&lt;/a&gt; extension and a &lt;code&gt;dotnet&lt;/code&gt; CLI tool for &lt;a href="https://github.com/dotnet/docfx" rel="noopener noreferrer"&gt;DocFX&lt;/a&gt;. It does two things: it inlines markdown fetched from an HTTP service at build time, and it can send each fully-assembled page to a transform service for centralized governance.&lt;/p&gt;

&lt;p&gt;It is &lt;strong&gt;not a fork of DocFX&lt;/strong&gt;. It plugs into DocFX's public&lt;br&gt;
&lt;code&gt;BuildOptions.ConfigureMarkdig&lt;/code&gt; seam, so it tracks upstream DocFX releases as a regular&lt;br&gt;
NuGet dependency. It targets &lt;strong&gt;.NET 8, 9, and 10&lt;/strong&gt;.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;If you read my earlier post on this project, it had a per-include AI &lt;em&gt;rewrite&lt;/em&gt; feature —&lt;br&gt;
you tagged an individual include and a model rewrote that fragment in place. That's&lt;br&gt;
gone now. Governance moved up to a single page-level transform that runs once on the&lt;br&gt;
assembled page, which is both simpler and a better fit for the problem. More on that&lt;br&gt;
below.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;h2&gt;
  
  
  The include directive
&lt;/h2&gt;

&lt;p&gt;In any markdown file processed by DocFX:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight markdown"&gt;&lt;code&gt;Some local content.

&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nv"&gt;!remoteinclude[Welcome&lt;/span&gt;&lt;span class="p"&gt;](&lt;/span&gt;&lt;span class="sx"&gt;snippets/welcome.md&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;]

Inline usage also works: today's status is &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nv"&gt;!remoteinclude[s&lt;/span&gt;&lt;span class="p"&gt;](&lt;/span&gt;&lt;span class="sx"&gt;status/prod.md&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;].
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;At build time the extension performs &lt;code&gt;GET {baseUrl}/{source}&lt;/code&gt;, parses the response as&lt;br&gt;
markdown, and inlines it. The directive has two shapes that share one syntax:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Block&lt;/strong&gt; — the directive is the only thing on its line. The fetched markdown is
inlined as block content (headings, lists, paragraphs).&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Inline&lt;/strong&gt; — the directive appears mid-paragraph. The fetched markdown must reduce to
a single paragraph; only its inline content is spliced in, with no &lt;code&gt;&amp;lt;p&amp;gt;&lt;/code&gt; wrapper.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Nested directives, cycle detection (via an &lt;code&gt;AsyncLocal&lt;/code&gt; source stack, max depth 8), an&lt;br&gt;
in-process per-build cache, and concurrency capped at 8 in-flight requests are all&lt;br&gt;
built in. By default a missing source fails the build; &lt;code&gt;--allow-missing&lt;/code&gt; renders a&lt;br&gt;
visible error placeholder instead.&lt;/p&gt;

&lt;p&gt;If your content service uses a non-trivial URL scheme, set &lt;code&gt;urlTemplate&lt;/code&gt; with a&lt;br&gt;
&lt;code&gt;{source}&lt;/code&gt; placeholder:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"baseUrl"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"https://api.example.com/"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"urlTemplate"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"content/GetFile?path={source}"&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Optional page transform
&lt;/h2&gt;

&lt;p&gt;After all includes are resolved and a page is fully assembled, the build can send that&lt;br&gt;
page to a &lt;strong&gt;page transform service&lt;/strong&gt;. The service owns the rules; pages only declare&lt;br&gt;
intent via YAML frontmatter:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="nn"&gt;---&lt;/span&gt;
&lt;span class="na"&gt;transform&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;audience&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;engineer&lt;/span&gt;
  &lt;span class="na"&gt;intent&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;onboarding&lt;/span&gt;
  &lt;span class="na"&gt;overrides&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;prerequisites&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;target&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;macOS&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;users"&lt;/span&gt;
&lt;span class="nn"&gt;---&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The extension extracts this &lt;code&gt;transform:&lt;/code&gt; block, sends the assembled page plus the&lt;br&gt;
metadata to your configured endpoint, and uses the response. Governance happens once,&lt;br&gt;
at the page level. The library only defines the contract:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight csharp"&gt;&lt;code&gt;&lt;span class="k"&gt;public&lt;/span&gt; &lt;span class="k"&gt;interface&lt;/span&gt; &lt;span class="nc"&gt;IPageTransformService&lt;/span&gt;
&lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="n"&gt;Task&lt;/span&gt;&lt;span class="p"&gt;&amp;lt;&lt;/span&gt;&lt;span class="n"&gt;PageTransformResponse&lt;/span&gt;&lt;span class="p"&gt;&amp;gt;&lt;/span&gt; &lt;span class="nf"&gt;TransformAsync&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;PageTransformRequest&lt;/span&gt; &lt;span class="n"&gt;request&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;CancellationToken&lt;/span&gt; &lt;span class="n"&gt;ct&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="k"&gt;default&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Omit the &lt;code&gt;transform&lt;/code&gt; config to disable it. The library itself has no Azure dependency —&lt;br&gt;
implement &lt;code&gt;IPageTransformService&lt;/code&gt; with an LLM or with deterministic rules.&lt;/p&gt;
&lt;h2&gt;
  
  
  The reference content-and-transform service
&lt;/h2&gt;

&lt;p&gt;The repo ships a reference service (&lt;code&gt;samples/knowledge-service&lt;/code&gt;) that plays both roles&lt;br&gt;
from one endpoint surface:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Endpoint&lt;/th&gt;
&lt;th&gt;Method&lt;/th&gt;
&lt;th&gt;Purpose&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;/content/{path}&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;GET&lt;/td&gt;
&lt;td&gt;Serves markdown for &lt;code&gt;[!remoteinclude]&lt;/code&gt; directives&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;/transform&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;POST&lt;/td&gt;
&lt;td&gt;Transforms the assembled page&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;/health&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;GET&lt;/td&gt;
&lt;td&gt;Health check&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Point both at the same service in &lt;code&gt;remoteinclude.json&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"baseUrl"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"http://localhost:8080/content/"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"transform"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"endpoint"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"http://localhost:8080/transform"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"auth"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"mode"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"none"&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The &lt;code&gt;/transform&lt;/code&gt; endpoint has three behaviors depending on how you configure it:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;With central guidance&lt;/strong&gt; (markdown under &lt;code&gt;content/guidance/&lt;/code&gt;): it treats that
guidance as the source of truth, compares each assembled page against it, and inserts
&lt;code&gt;&amp;gt; [!NOTE] **Team Override**&lt;/code&gt; callouts above content that deviates. It also harmonizes
tone and structure for the page's &lt;code&gt;audience&lt;/code&gt;/&lt;code&gt;intent&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Without guidance content&lt;/strong&gt;: it is a passthrough to the LLM — harmonizing tone and
structure for the declared audience/intent, with no comparison or override callouts.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Without an &lt;code&gt;AiEndpoint&lt;/code&gt; configured&lt;/strong&gt;: it returns the page unchanged. &lt;code&gt;/content&lt;/code&gt;
still works.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Running it as a sidecar
&lt;/h3&gt;

&lt;p&gt;The container binds &lt;code&gt;0.0.0.0:8080&lt;/code&gt; (set via &lt;code&gt;ASPNETCORE_URLS&lt;/code&gt; in the Dockerfile), reads&lt;br&gt;
all configuration from environment variables, and has no required dependencies, so it&lt;br&gt;
runs cleanly as a sidecar next to a docs build. With no &lt;code&gt;content/guidance&lt;/code&gt; and no&lt;br&gt;
&lt;code&gt;AiEndpoint&lt;/code&gt; it still serves &lt;code&gt;/content&lt;/code&gt; and returns pages unchanged from &lt;code&gt;/transform&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;Kubernetes — run it in the same Pod and probe &lt;code&gt;/health&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;containers&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;docs-build&lt;/span&gt;
    &lt;span class="na"&gt;image&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;your-docs-builder:latest&lt;/span&gt;
    &lt;span class="c1"&gt;# baseUrl/transform.endpoint point at http://localhost:8080&lt;/span&gt;
  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;knowledge-service&lt;/span&gt;
    &lt;span class="na"&gt;image&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;ghcr.io/saipramod/knowledge-service:latest&lt;/span&gt;
    &lt;span class="na"&gt;ports&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;containerPort&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;8080&lt;/span&gt;
    &lt;span class="na"&gt;env&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Transform__AiEndpoint&lt;/span&gt;
        &lt;span class="na"&gt;value&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;"&lt;/span&gt;            &lt;span class="c1"&gt;# empty = passthrough/no-op transform&lt;/span&gt;
    &lt;span class="na"&gt;readinessProbe&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;httpGet&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;{&lt;/span&gt; &lt;span class="nv"&gt;path&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="nv"&gt;/health&lt;/span&gt;&lt;span class="pi"&gt;,&lt;/span&gt; &lt;span class="nv"&gt;port&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="nv"&gt;8080&lt;/span&gt; &lt;span class="pi"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Because the sidecar shares &lt;code&gt;localhost&lt;/code&gt; with the build container, point both &lt;code&gt;baseUrl&lt;/code&gt;&lt;br&gt;
and &lt;code&gt;transform.endpoint&lt;/code&gt; at &lt;code&gt;http://localhost:8080&lt;/code&gt;. A Docker Compose example is in the&lt;br&gt;
sample's README.&lt;/p&gt;
&lt;h2&gt;
  
  
  Using it
&lt;/h2&gt;

&lt;p&gt;As a CLI (&lt;code&gt;dotnet tool&lt;/code&gt;), with a &lt;code&gt;remoteinclude.json&lt;/code&gt; next to your &lt;code&gt;docfx.json&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight powershell"&gt;&lt;code&gt;&lt;span class="n"&gt;dotnet&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;tool&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;install&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;-g&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;Documentation.DocfxRemoteInclude.Cli&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="n"&gt;docfx-ri&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;build&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;docs/docfx.json&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;As a library, for hosts that call &lt;code&gt;Docset.Build(...)&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight csharp"&gt;&lt;code&gt;&lt;span class="k"&gt;using&lt;/span&gt; &lt;span class="nn"&gt;Docfx&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;using&lt;/span&gt; &lt;span class="nn"&gt;Docfx.RemoteInclude&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="k"&gt;using&lt;/span&gt; &lt;span class="nn"&gt;var&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nf"&gt;HttpRemoteContentClient&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;baseUri&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nf"&gt;Uri&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"https://internal.example.com/"&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="n"&gt;authHandler&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;request&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;ct&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;=&amp;gt;&lt;/span&gt;
        &lt;span class="n"&gt;request&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Headers&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Authorization&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"Bearer"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;GetJwtAsync&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ct&lt;/span&gt;&lt;span class="p"&gt;)));&lt;/span&gt;

&lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;Docset&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;Build&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"docs/docfx.json"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="n"&gt;BuildOptions&lt;/span&gt;
&lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="n"&gt;ConfigureMarkdig&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="n"&gt;pipeline&lt;/span&gt; &lt;span class="p"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;pipeline&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;UseRemoteInclude&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="n"&gt;RemoteIncludeOptions&lt;/span&gt;
    &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="n"&gt;PageTransformService&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="n"&gt;myTransformService&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="c1"&gt;// optional IPageTransformService&lt;/span&gt;
    &lt;span class="p"&gt;}),&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Provide your own &lt;code&gt;IRemoteContentClient&lt;/code&gt; for non-HTTP sources, custom auth (mTLS, signed&lt;br&gt;
URLs), or on-disk caching.&lt;/p&gt;
&lt;h3&gt;
  
  
  Auth and credentials
&lt;/h3&gt;

&lt;p&gt;Both the content and transform &lt;code&gt;auth&lt;/code&gt; blocks accept&lt;br&gt;
&lt;code&gt;{ "mode": "none" | "default" | "managedIdentity" | "jwt" | "key", "value": "...", "scope": "..." }&lt;/code&gt;.&lt;br&gt;
&lt;code&gt;value&lt;/code&gt; can indirect through environment variables with &lt;code&gt;$VAR&lt;/code&gt; / &lt;code&gt;${VAR}&lt;/code&gt;, and &lt;code&gt;scope&lt;/code&gt;&lt;br&gt;
overrides the OAuth audience for &lt;code&gt;default&lt;/code&gt;/&lt;code&gt;managedIdentity&lt;/code&gt; modes. Credentials are read&lt;br&gt;
from environment variables or a host-supplied callback — never from &lt;code&gt;docfx.json&lt;/code&gt;, and&lt;br&gt;
never written to disk.&lt;/p&gt;
&lt;h2&gt;
  
  
  Try it end to end
&lt;/h2&gt;

&lt;p&gt;The repo includes a runnable &lt;code&gt;samples/basic&lt;/code&gt; DocFX site wired to the&lt;br&gt;
&lt;code&gt;samples/knowledge-service&lt;/code&gt; reference service. Run the service, build the sample, and&lt;br&gt;
you get shared snippets and page-level governance working together:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight powershell"&gt;&lt;code&gt;&lt;span class="c"&gt;# terminal 1&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="n"&gt;cd&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;samples/knowledge-service&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="n"&gt;dotnet&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;run&lt;/span&gt;&lt;span class="w"&gt;

&lt;/span&gt;&lt;span class="c"&gt;# terminal 2&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="n"&gt;docfx-ri&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;build&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;samples/basic/docfx.json&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;MIT-licensed. Issues and PRs welcome.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;GitHub:&lt;/strong&gt; &lt;a href="https://github.com/saipramod/docfx-remote-include" rel="noopener noreferrer"&gt;github.com/saipramod/docfx-remote-include&lt;/a&gt;&lt;br&gt;
&lt;strong&gt;NuGet:&lt;/strong&gt; &lt;a href="https://www.nuget.org/packages/Documentation.DocfxRemoteInclude" rel="noopener noreferrer"&gt;Documentation.DocfxRemoteInclude&lt;/a&gt; · &lt;a href="https://www.nuget.org/packages/Documentation.DocfxRemoteInclude.Cli" rel="noopener noreferrer"&gt;Documentation.DocfxRemoteInclude.Cli&lt;/a&gt;&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Sai Pramod Upadhyayula is a Senior Software Engineer at Microsoft working on&lt;br&gt;
AI-powered enterprise knowledge platforms, and a contributor to the DocFX&lt;br&gt;
open-source ecosystem.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>docfx</category>
      <category>dotnet</category>
      <category>opensource</category>
    </item>
    <item>
      <title>Stop Cloning Entire Repos for Your Doc Builds</title>
      <dc:creator>sai pramod upadhyayula</dc:creator>
      <pubDate>Wed, 27 May 2026 01:27:33 +0000</pubDate>
      <link>https://dev.to/saipramod/stop-cloning-entire-repos-for-your-doc-builds-28i0</link>
      <guid>https://dev.to/saipramod/stop-cloning-entire-repos-for-your-doc-builds-28i0</guid>
      <description>&lt;p&gt;Your docs live next to your code. That's the docs-as-code promise — version control, pull request reviews, CI/CD pipelines. It works beautifully.&lt;/p&gt;

&lt;p&gt;Until your repo hits 100,000 files.&lt;/p&gt;

&lt;h2&gt;
  
  
  The problem nobody talks about
&lt;/h2&gt;

&lt;p&gt;Our team runs a documentation portal that pulls content from dozens of large repositories. Each doc build needs a handful of markdown files and images from repos containing hundreds of thousands of files. The naive approach — &lt;code&gt;git clone&lt;/code&gt; — is painfully slow and wasteful.&lt;/p&gt;

&lt;p&gt;We tried sparse checkout. We tried shallow clones. We tried the git provider APIs directly. Each came with its own problems:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Full clones&lt;/strong&gt;: Minutes of download time for a build that needs 50 files&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;API file downloads&lt;/strong&gt;: Hit rate limits after a few hundred files&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Sparse checkout&lt;/strong&gt;: Still requires git history negotiation and doesn't help with API-based pipelines&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The irony? &lt;strong&gt;The manifest already declares exactly which files are needed.&lt;/strong&gt; The &lt;code&gt;docfx.json&lt;/code&gt; (or whatever config your static site generator uses) lists every content glob, every resource pattern. We just weren't using that information early enough.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why this matters even more now: AI agents
&lt;/h2&gt;

&lt;p&gt;This isn't just a build-speed problem anymore. If you're building AI agents that answer questions about your product, help onboard developers, or assist with internal processes — they need access to your documentation. Not your code. Not your tests. The &lt;em&gt;docs&lt;/em&gt;.&lt;/p&gt;

&lt;p&gt;The challenge scales fast:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;RAG pipelines&lt;/strong&gt; need to ingest documentation from dozens of repos — cloning all of them is absurd&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Incremental indexing&lt;/strong&gt; requires knowing which files &lt;em&gt;are&lt;/em&gt; documentation vs. code — the manifest already tells you&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Multi-repo knowledge bases&lt;/strong&gt; need a fast, selective way to pull only content files across many repos&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The faster and more precisely you can extract documentation from your repositories, the fresher and more accurate your agents' knowledge becomes. Solving the selective fetch problem unlocks both faster builds &lt;em&gt;and&lt;/em&gt; reliable AI-powered documentation experiences.&lt;/p&gt;

&lt;h2&gt;
  
  
  The idea: resolve before you fetch
&lt;/h2&gt;

&lt;p&gt;What if we flipped the order?&lt;/p&gt;

&lt;p&gt;Instead of: clone everything → build → throw away 99% of the files&lt;/p&gt;

&lt;p&gt;We do: get the file listing → match against manifest → fetch only what matches&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;┌─────────────────┐     ┌──────────────────────┐     ┌─────────────────┐
│  Git Provider    │     │ selective-repo-fetch  │     │  Doc Pipeline   │
│  (file listing)  │────▶│  (manifest matching   │────▶│  (build only    │
│                  │     │   + reference filter) │     │   matched files)│
└─────────────────┘     └──────────────────────┘     └─────────────────┘
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;A file tree listing from GitHub/Azure DevOps/GitLab is a single, cheap API call — it returns metadata, not file contents. Match that listing against your manifest patterns, and you know exactly what to fetch.&lt;/p&gt;

&lt;h2&gt;
  
  
  Introducing selective-repo-fetch
&lt;/h2&gt;

&lt;p&gt;We open-sourced this logic as a TypeScript library: &lt;a href="https://github.com/microsoft/selective-repo-fetch" rel="noopener noreferrer"&gt;&lt;strong&gt;selective-repo-fetch&lt;/strong&gt;&lt;/a&gt;. It's MIT-licensed and provider-agnostic.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;npm &lt;span class="nb"&gt;install &lt;/span&gt;github:microsoft/selective-repo-fetch
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Here's the core workflow:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;resolveFileMatches&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;filterReferencedResources&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;selective-repo-fetch&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="c1"&gt;// Your manifest declares what your doc site needs&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;manifest&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="na"&gt;build&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="na"&gt;content&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[{&lt;/span&gt; &lt;span class="na"&gt;files&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;**/*.md&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="na"&gt;src&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;docs&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt; &lt;span class="p"&gt;}],&lt;/span&gt;
    &lt;span class="na"&gt;resource&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[{&lt;/span&gt; &lt;span class="na"&gt;files&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;**/*.{png,jpg,svg}&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="na"&gt;src&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;docs/images&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt; &lt;span class="p"&gt;}],&lt;/span&gt;
  &lt;span class="p"&gt;},&lt;/span&gt;
&lt;span class="p"&gt;};&lt;/span&gt;

&lt;span class="c1"&gt;// Step 1: Get file listing from any git API (one cheap metadata call)&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;repoFiles&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;getTreeListing&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt; &lt;span class="c1"&gt;// returns [{ path: '/docs/intro.md' }, ...]&lt;/span&gt;

&lt;span class="c1"&gt;// Step 2: Resolve manifest → content + resource matches&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;matched&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;resolveFileMatches&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;repoFiles&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;manifest&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;/&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;/docfx.json&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="c1"&gt;// matched.contentMatches → only the markdown files your build needs&lt;/span&gt;
&lt;span class="c1"&gt;// matched.resourceMatches → only images/videos matching resource globs&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;From 200,000 files down to the 50 that matter. One function call.&lt;/p&gt;

&lt;h2&gt;
  
  
  Going further: filtering unreferenced resources
&lt;/h2&gt;

&lt;p&gt;Glob matching is great, but it can be too generous. A &lt;code&gt;**/*.png&lt;/code&gt; pattern in your resource section will match every image under that folder — even the ones no markdown file actually references.&lt;/p&gt;

&lt;p&gt;For large repos, this matters. Unreferenced images can be megabytes of wasted downloads.&lt;/p&gt;

&lt;p&gt;So we added a second pass:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="c1"&gt;// Step 3: Fetch the content files (small text — fast and cheap)&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;contentFileTexts&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{};&lt;/span&gt;
&lt;span class="k"&gt;for &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;filePath&lt;/span&gt; &lt;span class="k"&gt;of&lt;/span&gt; &lt;span class="nx"&gt;matched&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;contentMatches&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nx"&gt;contentFileTexts&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nx"&gt;filePath&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;fetchFileContent&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;filePath&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="c1"&gt;// Step 4: Filter resources to only those actually referenced&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;referencedResources&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;filterReferencedResources&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
  &lt;span class="nx"&gt;matched&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;resourceMatches&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="nx"&gt;contentFileTexts&lt;/span&gt;
&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="c1"&gt;// Scans markdown/HTML for ![](path), &amp;lt;img src="path"&amp;gt;, [text](path), etc.&lt;/span&gt;
&lt;span class="c1"&gt;// Drops any resource not referenced by any content file&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This scans your content files for markdown image references (&lt;code&gt;![](path)&lt;/code&gt;), links (&lt;code&gt;[text](path)&lt;/code&gt;), and HTML attributes (&lt;code&gt;src="path"&lt;/code&gt;, &lt;code&gt;href="path"&lt;/code&gt;). If a resource file isn't referenced anywhere in your content, it gets dropped.&lt;/p&gt;

&lt;h2&gt;
  
  
  The full pipeline
&lt;/h2&gt;

&lt;p&gt;Here's what it looks like end-to-end with the GitHub API:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;Octokit&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;@octokit/rest&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;resolveFileMatches&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;filterReferencedResources&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;selective-repo-fetch&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;octokit&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;Octokit&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt; &lt;span class="na"&gt;auth&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;token&lt;/span&gt; &lt;span class="p"&gt;});&lt;/span&gt;

&lt;span class="c1"&gt;// 1. One API call to get the full file tree (metadata only, no content)&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;data&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;octokit&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;git&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;getTree&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
  &lt;span class="nx"&gt;owner&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;repo&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;tree_sha&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;HEAD&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;recursive&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;true&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;

&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;files&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;data&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;tree&lt;/span&gt;
  &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;filter&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;item&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nx"&gt;item&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="kd"&gt;type&lt;/span&gt; &lt;span class="o"&gt;===&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;blob&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
  &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;map&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;item&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;({&lt;/span&gt; &lt;span class="na"&gt;path&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;/&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="nx"&gt;item&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;path&lt;/span&gt; &lt;span class="p"&gt;}));&lt;/span&gt;

&lt;span class="c1"&gt;// 2. Resolve manifest patterns&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;manifest&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;JSON&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;parse&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="cm"&gt;/* your docfx.json */&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;matched&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;resolveFileMatches&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;files&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;manifest&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;/&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;/docfx.json&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="c1"&gt;// 3. Fetch content files (small text)&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;contentTexts&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;Record&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="kr"&gt;string&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="kr"&gt;string&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{};&lt;/span&gt;
&lt;span class="k"&gt;for &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;path&lt;/span&gt; &lt;span class="k"&gt;of&lt;/span&gt; &lt;span class="nx"&gt;matched&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;contentMatches&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;data&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;octokit&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;repos&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;getContent&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt; &lt;span class="nx"&gt;owner&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;repo&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;path&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;path&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;slice&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;});&lt;/span&gt;
  &lt;span class="nx"&gt;contentTexts&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nx"&gt;path&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;Buffer&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="k"&gt;from&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;data&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;content&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;base64&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;toString&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="c1"&gt;// 4. Filter resources to only referenced ones&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;resources&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;filterReferencedResources&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;matched&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;resourceMatches&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;contentTexts&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="c1"&gt;// 5. Fetch only referenced resources&lt;/span&gt;
&lt;span class="c1"&gt;// You now have the exact list — nothing wasted&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  What it handles
&lt;/h2&gt;

&lt;p&gt;The manifest matching is thorough:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Glob patterns&lt;/strong&gt; with brace expansion (&lt;code&gt;*.{md,yml}&lt;/code&gt;)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;code&gt;src&lt;/code&gt; path resolution&lt;/strong&gt; relative to manifest location&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Per-section excludes&lt;/strong&gt; (&lt;code&gt;exclude: ["**/draft/**"]&lt;/code&gt;)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Templates, metadata files, &lt;code&gt;.order&lt;/code&gt; files&lt;/strong&gt; — auto-included&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;External references&lt;/strong&gt; via &lt;code&gt;src: "../other-folder"&lt;/code&gt; — discovered before you fetch&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The reference filter handles:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Markdown images and links: &lt;code&gt;![alt](path)&lt;/code&gt;, &lt;code&gt;[text](path)&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;HTML attributes: &lt;code&gt;&amp;lt;img src="path"&amp;gt;&lt;/code&gt;, &lt;code&gt;&amp;lt;video src="path"&amp;gt;&lt;/code&gt;, &lt;code&gt;&amp;lt;a href="path"&amp;gt;&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;Path normalization: strips &lt;code&gt;~/&lt;/code&gt;, leading &lt;code&gt;/&lt;/code&gt;, query strings, anchors&lt;/li&gt;
&lt;li&gt;Skips external URLs, data URIs, &lt;code&gt;mailto:&lt;/code&gt;, &lt;code&gt;javascript:&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;Case-insensitive filename matching&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  When to use this
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Documentation portals&lt;/strong&gt; pulling from multiple repos — resolve before you clone&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Monorepo doc builds&lt;/strong&gt; — your manifest knows what matters, use it&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;CI/CD pipelines&lt;/strong&gt; — cut build times by fetching only what changed&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Any static site generator&lt;/strong&gt; (DocFX, MkDocs, Sphinx, Docusaurus) that uses a manifest&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Why this matters for AI agents
&lt;/h2&gt;

&lt;p&gt;There's a downstream benefit we didn't anticipate when we first built this: &lt;strong&gt;making documentation efficiently available to AI agents&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;If you're building agents that answer questions about your product, help onboard developers, or assist with internal processes — they need access to your documentation. But they don't need your entire codebase. They need the &lt;em&gt;docs&lt;/em&gt;.&lt;/p&gt;

&lt;p&gt;The manifest-driven approach gives you exactly that separation:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Selective ingestion&lt;/strong&gt; — only pull documentation into your RAG pipeline, not code, tests, or CI configs&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Incremental updates&lt;/strong&gt; — when a doc changes, you know it's a doc (not a code file) because the manifest says so&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Multi-repo knowledge bases&lt;/strong&gt; — pull docs from 50+ repos without cloning any of them
&lt;/li&gt;
&lt;/ol&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="c1"&gt;// Feed docs from multiple repos into your agent's knowledge base&lt;/span&gt;
&lt;span class="k"&gt;for &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;repo&lt;/span&gt; &lt;span class="k"&gt;of&lt;/span&gt; &lt;span class="nx"&gt;repositories&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;files&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;getTreeListing&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;repo&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;matched&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;resolveFileMatches&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;files&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;repo&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;manifest&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;/&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;/docfx.json&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

  &lt;span class="c1"&gt;// Only index documentation — not code, not tests, not configs&lt;/span&gt;
  &lt;span class="k"&gt;for &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;docPath&lt;/span&gt; &lt;span class="k"&gt;of&lt;/span&gt; &lt;span class="nx"&gt;matched&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;contentMatches&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;content&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;fetchFile&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;repo&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;docPath&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;knowledgeBase&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;ingest&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt; &lt;span class="na"&gt;path&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;docPath&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;repo&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;repo&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;content&lt;/span&gt; &lt;span class="p"&gt;});&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The faster and more precisely you can extract documentation from your repos, the fresher and more accurate your agents' knowledge becomes. Efficient content fetching is the foundation of a reliable AI-powered docs experience.&lt;/p&gt;

&lt;h2&gt;
  
  
  Try it out
&lt;/h2&gt;

&lt;p&gt;The library is MIT-licensed and has zero opinions about your git provider — it works with any API that can give you a file listing.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;npm &lt;span class="nb"&gt;install &lt;/span&gt;github:microsoft/selective-repo-fetch
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;GitHub: &lt;a href="https://github.com/microsoft/selective-repo-fetch" rel="noopener noreferrer"&gt;microsoft/selective-repo-fetch&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;If your doc builds are slow because of large repos, give it a try. And if you have ideas for improvements, PRs are welcome.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;What's the worst monorepo doc build experience you've had? I'd love to hear about it in the comments.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>documentation</category>
      <category>opensource</category>
      <category>typescript</category>
      <category>devops</category>
    </item>
    <item>
      <title>Building a Markdig Extension for DocFX: Remote Content Inclusion with AI Rewriting</title>
      <dc:creator>sai pramod upadhyayula</dc:creator>
      <pubDate>Fri, 22 May 2026 22:35:09 +0000</pubDate>
      <link>https://dev.to/saipramod/building-a-markdig-extension-for-docfx-remote-content-inclusion-with-ai-rewriting-hb7</link>
      <guid>https://dev.to/saipramod/building-a-markdig-extension-for-docfx-remote-content-inclusion-with-ai-rewriting-hb7</guid>
      <description>&lt;h2&gt;
  
  
  The Problem Every Documentation Team Hits
&lt;/h2&gt;

&lt;p&gt;If you've worked on a documentation platform at any scale, you've hit this&lt;br&gt;
problem: content lives in multiple places, and you need to compose it into&lt;br&gt;
a single, coherent site at build time.&lt;/p&gt;

&lt;p&gt;Maybe your API reference is generated from code. Maybe your troubleshooting&lt;br&gt;
guides live in a separate service. Maybe different teams own different sections,&lt;br&gt;
and you need to pull them together into one DocFX site without copy-pasting&lt;br&gt;
content that goes stale the moment it's duplicated.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://github.com/dotnet/docfx" rel="noopener noreferrer"&gt;DocFX&lt;/a&gt; — Microsoft's open-source documentation&lt;br&gt;
generator — is excellent at building static documentation from local markdown&lt;br&gt;
files. But it doesn't natively support fetching and inlining content from remote&lt;br&gt;
sources at build time.&lt;/p&gt;

&lt;p&gt;I needed exactly that capability. So I built it.&lt;/p&gt;


&lt;h2&gt;
  
  
  Introducing docfx-remote-include
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://github.com/saipramod/docfx-remote-include" rel="noopener noreferrer"&gt;&lt;strong&gt;docfx-remote-include&lt;/strong&gt;&lt;/a&gt; is&lt;br&gt;
a standalone &lt;a href="https://github.com/xoofx/markdig" rel="noopener noreferrer"&gt;Markdig&lt;/a&gt; extension and CLI tool&lt;br&gt;
that adds remote content inclusion to DocFX.&lt;/p&gt;

&lt;p&gt;It's &lt;strong&gt;not a fork of DocFX&lt;/strong&gt;. It hooks into DocFX's public &lt;code&gt;BuildOptions.ConfigureMarkdig&lt;/code&gt;&lt;br&gt;
extension point, so it tracks upstream releases as a regular NuGet dependency. When&lt;br&gt;
DocFX updates, your remote include capability doesn't break.&lt;/p&gt;
&lt;h3&gt;
  
  
  The Directive
&lt;/h3&gt;

&lt;p&gt;In any markdown file processed by DocFX, you can write:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight markdown"&gt;&lt;code&gt;Some local content.

&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nv"&gt;!remoteinclude[Welcome&lt;/span&gt;&lt;span class="p"&gt;](&lt;/span&gt;&lt;span class="sx"&gt;path/to/snippet.md&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;]

More local content.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;At build time, the extension fetches &lt;code&gt;{baseUrl}/path/to/snippet.md&lt;/code&gt; via HTTP,&lt;br&gt;
parses the response as markdown, and inlines the result. It works in two modes:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Block mode&lt;/strong&gt; — when the directive is the only thing on its line, the fetched
content is inlined as full block content (headings, lists, paragraphs).&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Inline mode&lt;/strong&gt; — when the directive appears mid-paragraph, only inline content
is spliced in (no wrapping &lt;code&gt;&amp;lt;p&amp;gt;&lt;/code&gt; tags).&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;
  
  
  The AI Twist
&lt;/h3&gt;

&lt;p&gt;Here's where it gets interesting. You can optionally add a rewrite hint:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight markdown"&gt;&lt;code&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nv"&gt;!remoteinclude[Install&lt;/span&gt;&lt;span class="p"&gt;](&lt;/span&gt;&lt;span class="sx"&gt;snippets/install.md&lt;/span&gt; &lt;span class="nn"&gt;"match this page's tone and tense"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;]
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;When a hint is provided, the fetched content is passed through a pluggable&lt;br&gt;
&lt;code&gt;IRewriteService&lt;/code&gt; — backed by any LLM you choose (Azure OpenAI, local models,&lt;br&gt;
anything) — which adapts the content to match the surrounding page's voice&lt;br&gt;
and style.&lt;/p&gt;

&lt;p&gt;Without a hint, the content is inlined verbatim. The AI capability is entirely&lt;br&gt;
opt-in and has zero vendor lock-in.&lt;/p&gt;


&lt;h2&gt;
  
  
  Architecture Decisions
&lt;/h2&gt;
&lt;h3&gt;
  
  
  Why an Extension, Not a Fork?
&lt;/h3&gt;

&lt;p&gt;Forking DocFX would mean maintaining a parallel codebase and falling behind on&lt;br&gt;
upstream improvements. Instead, &lt;code&gt;docfx-remote-include&lt;/code&gt; uses the public&lt;br&gt;
&lt;code&gt;ConfigureMarkdig&lt;/code&gt; seam that DocFX exposes:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight csharp"&gt;&lt;code&gt;&lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;Docset&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;Build&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"docs/docfx.json"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="n"&gt;BuildOptions&lt;/span&gt;
&lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="n"&gt;ConfigureMarkdig&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="n"&gt;pipeline&lt;/span&gt; &lt;span class="p"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;pipeline&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;UseRemoteInclude&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;options&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This means:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Zero maintenance burden from DocFX internals&lt;/li&gt;
&lt;li&gt;Works with any DocFX version that exposes &lt;code&gt;ConfigureMarkdig&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;Can be combined with other Markdig extensions&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Auth Flexibility
&lt;/h3&gt;

&lt;p&gt;Enterprise documentation often lives behind authentication. The extension&lt;br&gt;
supports multiple auth modes out of the box:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Mode&lt;/th&gt;
&lt;th&gt;Use Case&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;none&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Public content services&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;default&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Azure Default Credential (local dev, CI/CD)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;managedIdentity&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Azure Managed Identity (production)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;jwt&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Bearer token (custom auth)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;key&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;API key header&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;All credentials are read from environment variables or host callbacks — never&lt;br&gt;
from config files committed to source control.&lt;/p&gt;
&lt;h3&gt;
  
  
  Safety Features
&lt;/h3&gt;

&lt;p&gt;When you're pulling remote content into a build pipeline, things can go wrong:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Cycle detection&lt;/strong&gt; — an &lt;code&gt;AsyncLocal&lt;/code&gt; source stack prevents infinite recursion
when remote content includes other remote content. Max depth defaults to 8.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Concurrency control&lt;/strong&gt; — in-flight requests are capped at 8 by default to
avoid overwhelming the content service.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;In-process caching&lt;/strong&gt; — each source URL is fetched once per build, regardless
of how many pages reference it.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Hard fail by default&lt;/strong&gt; — if a remote source returns 404, the build fails.
Use &lt;code&gt;--allow-missing&lt;/code&gt; to render a visible error placeholder instead.&lt;/li&gt;
&lt;/ul&gt;


&lt;h2&gt;
  
  
  Getting Started
&lt;/h2&gt;
&lt;h3&gt;
  
  
  As a CLI Tool
&lt;/h3&gt;


&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight powershell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Add the NuGet source (one-time)&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="n"&gt;dotnet&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;nuget&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;add&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;source&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"https://nuget.pkg.github.com/saipramod/index.json"&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;\&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nt"&gt;--name&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"docfx-tools"&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;--username&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;YOUR_GITHUB_USERNAME&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;--password&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;YOUR_GITHUB_PAT&lt;/span&gt;&lt;span class="w"&gt;

&lt;/span&gt;&lt;span class="c"&gt;# Install the tool&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="n"&gt;dotnet&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;tool&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;install&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;-g&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;Docfx.RemoteInclude.Cli&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;--source&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"docfx-tools"&lt;/span&gt;&lt;span class="w"&gt;

&lt;/span&gt;&lt;span class="c"&gt;# Build your docs&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="n"&gt;docfx-ri&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;build&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;docs/docfx.json&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;h3&gt;
  
  
  Configuration
&lt;/h3&gt;

&lt;p&gt;Create &lt;code&gt;remoteinclude.json&lt;/code&gt; next to your &lt;code&gt;docfx.json&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"baseUrl"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"https://your-content-service.com/"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"allowMissing"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kc"&gt;false&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"urlTemplate"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"api/content/GetFile?path={source}"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"auth"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"mode"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"managedIdentity"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"scope"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"api://your-app-id/.default"&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"ai"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"endpoint"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"https://your-aoai.openai.azure.com/"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"deployment"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"gpt-4o-mini"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"contextStrategy"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"section"&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  As a Library
&lt;/h3&gt;

&lt;p&gt;For full control, use the library directly:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight csharp"&gt;&lt;code&gt;&lt;span class="k"&gt;using&lt;/span&gt; &lt;span class="nn"&gt;Docfx&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;using&lt;/span&gt; &lt;span class="nn"&gt;Docfx.RemoteInclude&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="k"&gt;using&lt;/span&gt; &lt;span class="nn"&gt;var&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nf"&gt;HttpRemoteContentClient&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;baseUri&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nf"&gt;Uri&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"https://your-content-service.com/"&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="n"&gt;authHandler&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;request&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;ct&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;=&amp;gt;&lt;/span&gt;
    &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="n"&gt;request&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Headers&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Authorization&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt;
            &lt;span class="k"&gt;new&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"Bearer"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;GetJwtAsync&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ct&lt;/span&gt;&lt;span class="p"&gt;));&lt;/span&gt;
    &lt;span class="p"&gt;});&lt;/span&gt;

&lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;Docset&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;Build&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"docs/docfx.json"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="n"&gt;BuildOptions&lt;/span&gt;
&lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="n"&gt;ConfigureMarkdig&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="n"&gt;pipeline&lt;/span&gt; &lt;span class="p"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;pipeline&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;UseRemoteInclude&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="n"&gt;RemoteIncludeOptions&lt;/span&gt;
        &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="n"&gt;RewriteService&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="n"&gt;myRewriter&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="c1"&gt;// optional&lt;/span&gt;
        &lt;span class="p"&gt;}),&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Implement &lt;code&gt;IRemoteContentClient&lt;/code&gt; for non-HTTP sources (file systems, databases,&lt;br&gt;
signed URLs). Implement &lt;code&gt;IRewriteService&lt;/code&gt; to plug in any LLM.&lt;/p&gt;




&lt;h2&gt;
  
  
  Why This Matters
&lt;/h2&gt;

&lt;p&gt;Documentation platforms at scale need to compose content from multiple&lt;br&gt;
authoritative sources. Copy-pasting creates drift. Git submodules add complexity.&lt;br&gt;
Custom build scripts are fragile.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;docfx-remote-include&lt;/code&gt; solves this with a clean, declarative syntax that works&lt;br&gt;
within DocFX's existing pipeline. The optional AI rewriting capability means&lt;br&gt;
content from different sources can read as if it was written for the page it&lt;br&gt;
appears on.&lt;/p&gt;

&lt;p&gt;The project is MIT-licensed, open source, and accepting contributions.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;GitHub:&lt;/strong&gt; &lt;a href="https://github.com/saipramod/docfx-remote-include" rel="noopener noreferrer"&gt;github.com/saipramod/docfx-remote-include&lt;/a&gt;&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Sai Pramod Upadhyayula is a Senior Software Engineer at Microsoft, where he&lt;br&gt;
works on AI-powered enterprise knowledge platforms. He co-authored "AutoTSG:&lt;br&gt;
Learning and Synthesis for Incident Troubleshooting" (ESEC/FSE 2022) and&lt;br&gt;
contributes to the DocFX open-source ecosystem.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>dotnet</category>
      <category>opensource</category>
      <category>markdig</category>
    </item>
  </channel>
</rss>
