DEV Community

sai pramod upadhyayula
sai pramod upadhyayula

Posted on

Building a Markdig Extension for DocFX: Remote Content Inclusion with AI Rewriting

The Problem Every Documentation Team Hits

If you've worked on a documentation platform at any scale, you've hit this
problem: content lives in multiple places, and you need to compose it into
a single, coherent site at build time.

Maybe your API reference is generated from code. Maybe your troubleshooting
guides live in a separate service. Maybe different teams own different sections,
and you need to pull them together into one DocFX site without copy-pasting
content that goes stale the moment it's duplicated.

DocFX — Microsoft's open-source documentation
generator — is excellent at building static documentation from local markdown
files. But it doesn't natively support fetching and inlining content from remote
sources at build time.

I needed exactly that capability. So I built it.


Introducing docfx-remote-include

docfx-remote-include is
a standalone Markdig extension and CLI tool
that adds remote content inclusion to DocFX.

It's not a fork of DocFX. It hooks into DocFX's public BuildOptions.ConfigureMarkdig
extension point, so it tracks upstream releases as a regular NuGet dependency. When
DocFX updates, your remote include capability doesn't break.

The Directive

In any markdown file processed by DocFX, you can write:

Some local content.

[!remoteinclude[Welcome](path/to/snippet.md)]

More local content.
Enter fullscreen mode Exit fullscreen mode

At build time, the extension fetches {baseUrl}/path/to/snippet.md via HTTP,
parses the response as markdown, and inlines the result. It works in two modes:

  • Block mode — when the directive is the only thing on its line, the fetched content is inlined as full block content (headings, lists, paragraphs).
  • Inline mode — when the directive appears mid-paragraph, only inline content is spliced in (no wrapping <p> tags).

The AI Twist

Here's where it gets interesting. You can optionally add a rewrite hint:

[!remoteinclude[Install](snippets/install.md "match this page's tone and tense")]
Enter fullscreen mode Exit fullscreen mode

When a hint is provided, the fetched content is passed through a pluggable
IRewriteService — backed by any LLM you choose (Azure OpenAI, local models,
anything) — which adapts the content to match the surrounding page's voice
and style.

Without a hint, the content is inlined verbatim. The AI capability is entirely
opt-in and has zero vendor lock-in.


Architecture Decisions

Why an Extension, Not a Fork?

Forking DocFX would mean maintaining a parallel codebase and falling behind on
upstream improvements. Instead, docfx-remote-include uses the public
ConfigureMarkdig seam that DocFX exposes:

await Docset.Build("docs/docfx.json", new BuildOptions
{
    ConfigureMarkdig = pipeline => pipeline.UseRemoteInclude(client, options),
});
Enter fullscreen mode Exit fullscreen mode

This means:

  • Zero maintenance burden from DocFX internals
  • Works with any DocFX version that exposes ConfigureMarkdig
  • Can be combined with other Markdig extensions

Auth Flexibility

Enterprise documentation often lives behind authentication. The extension
supports multiple auth modes out of the box:

Mode Use Case
none Public content services
default Azure Default Credential (local dev, CI/CD)
managedIdentity Azure Managed Identity (production)
jwt Bearer token (custom auth)
key API key header

All credentials are read from environment variables or host callbacks — never
from config files committed to source control.

Safety Features

When you're pulling remote content into a build pipeline, things can go wrong:

  • Cycle detection — an AsyncLocal source stack prevents infinite recursion when remote content includes other remote content. Max depth defaults to 8.
  • Concurrency control — in-flight requests are capped at 8 by default to avoid overwhelming the content service.
  • In-process caching — each source URL is fetched once per build, regardless of how many pages reference it.
  • Hard fail by default — if a remote source returns 404, the build fails. Use --allow-missing to render a visible error placeholder instead.

Getting Started

As a CLI Tool

# Add the NuGet source (one-time)
dotnet nuget add source "https://nuget.pkg.github.com/saipramod/index.json" \
  --name "docfx-tools" --username YOUR_GITHUB_USERNAME --password YOUR_GITHUB_PAT

# Install the tool
dotnet tool install -g Docfx.RemoteInclude.Cli --source "docfx-tools"

# Build your docs
docfx-ri build docs/docfx.json
Enter fullscreen mode Exit fullscreen mode

Configuration

Create remoteinclude.json next to your docfx.json:

{
  "baseUrl": "https://your-content-service.com/",
  "allowMissing": false,
  "urlTemplate": "api/content/GetFile?path={source}",
  "auth": {
    "mode": "managedIdentity",
    "scope": "api://your-app-id/.default"
  },
  "ai": {
    "endpoint": "https://your-aoai.openai.azure.com/",
    "deployment": "gpt-4o-mini",
    "contextStrategy": "section"
  }
}
Enter fullscreen mode Exit fullscreen mode

As a Library

For full control, use the library directly:

using Docfx;
using Docfx.RemoteInclude;

using var client = new HttpRemoteContentClient(
    baseUri: new Uri("https://your-content-service.com/"),
    authHandler: async (request, ct) =>
    {
        request.Headers.Authorization =
            new("Bearer", await GetJwtAsync(ct));
    });

await Docset.Build("docs/docfx.json", new BuildOptions
{
    ConfigureMarkdig = pipeline => pipeline.UseRemoteInclude(client,
        new RemoteIncludeOptions
        {
            RewriteService = myRewriter, // optional
        }),
});
Enter fullscreen mode Exit fullscreen mode

Implement IRemoteContentClient for non-HTTP sources (file systems, databases,
signed URLs). Implement IRewriteService to plug in any LLM.


Why This Matters

Documentation platforms at scale need to compose content from multiple
authoritative sources. Copy-pasting creates drift. Git submodules add complexity.
Custom build scripts are fragile.

docfx-remote-include solves this with a clean, declarative syntax that works
within DocFX's existing pipeline. The optional AI rewriting capability means
content from different sources can read as if it was written for the page it
appears on.

The project is MIT-licensed, open source, and accepting contributions.

GitHub: github.com/saipramod/docfx-remote-include


Sai Pramod Upadhyayula is a Senior Software Engineer at Microsoft, where he
works on AI-powered enterprise knowledge platforms. He co-authored "AutoTSG:
Learning and Synthesis for Incident Troubleshooting" (ESEC/FSE 2022) and
contributes to the DocFX open-source ecosystem.

Top comments (0)