DEV Community

Cover image for Word Document Editing for AI Agents - Introducing OfficeAgent.NET 0.1
Ilia Sokolov
Ilia Sokolov

Posted on

Word Document Editing for AI Agents - Introducing OfficeAgent.NET 0.1

A few weeks ago I wrote about why AI agents struggle with Word documents. The short version: agents produce text, Markdown, JSON, and HTML, but a .docx is an OOXML package, and most of what makes it a Word document lives in XML parts outside the visible text. Something has to translate the agent's output into a valid Word file, and the Word file back into something the agent can reason about. I called this the agent-document layer, and announced an open-source project to build it for .NET.

Today the first version is available. OfficeAgent.NET 0.1 is an open-source (MIT) .NET library that lets an AI agent describe Word document changes as a typed plan, while the library handles the Open XML details.

If your agents generate proposals, contracts, reports, or review packs, this layer is the missing piece of document automation between the model and the final .docx. This article is a first look at what the release contains and how it works.

The core idea: the agent never writes document bytes

The design decision behind OfficeAgent.NET is simple: the language model never produces .docx content directly.

Instead, the agent expresses intent as a plan: a typed, JSON-serializable list of operations. Replace this clause as a tracked change. Fill that content control. Add a row to this table. Attach a comment to that paragraph. JSON works well here because models produce it reliably and .NET code can validate it strictly. The library translates the plan into the Open XML manipulations that carry it out.

Version 0.1 ships 15 operations for text, table, image, and styling manipulations, for comments and document properties, and for accepting or rejecting existing tracked revisions.

The workflow: inspect → find → preview → commit

Every edit follows the same four steps:

  1. Inspect - the agent gets a structured map of the document: the outline, paragraphs with stable ids, styles, content controls, tables, images, and revisions.
  2. Find - the agent searches for text and gets an address back for each match.
  3. Preview - the plan is checked against the current document. Nothing is written; the caller gets a before/after report and any validation errors.
  4. Commit - the plan is applied as one all-or-nothing transaction. If any step fails, nothing is written.

The addresses are called anchors, and they are the safety mechanism. The library issues anchors from inspect and find; the agent reuses them and never invents one. Each anchor carries the content it expects to find, and at commit time the library re-checks it against the live document. If the document changed in the meantime, the operation fails safely instead of editing the wrong place.

Document providers: filesystem now, SharePoint and more later

OfficeAgent.NET does not read and write files directly. Documents come from document providers, an abstraction that lets the library work with any file source. The application registers a document with a provider (for the filesystem provider, by its path) and receives an opaque document id in return. From that point on, inspect, find, preview, and commit all address the document by that id, and the provider takes care of loading from and saving to the underlying store.

Version 0.1 ships a filesystem provider. Other providers, such as SharePoint, a database, or any custom store, can implement the same interface, and more will become available in later versions.

Editing Word documents from a Microsoft Agent Framework agent

The main scenario for OfficeAgent.NET is an agent doing the editing itself. For that, the workflow is exposed as agent tools over Microsoft.Extensions.AI: inspect_document, find_in_document, preview_plan, and apply_plan. Wiring them into a Microsoft Agent Framework agent looks like this:

using Microsoft.Agents.AI;
using Microsoft.Extensions.AI;
using Microsoft.Extensions.DependencyInjection;
using OfficeAgent.AgentFramework;
using OfficeAgent.Core;
using OfficeAgent.Core.DocumentProviders;
using OfficeAgent.Word;

var services = new ServiceCollection()
    .AddWordFormat()
    .AddFileSystemDocumentProvider("workspace", "/srv/officeagent/workspace")
    .AddOfficeAgent()
    .BuildServiceProvider();

var client = services.GetRequiredService<OfficeAgentClient>();

// Register the document with the provider; get an opaque id back.
var doc = await client.RegisterAsync(
    "workspace", "/srv/officeagent/workspace/contract.docx");

var tools  = new OfficeAgentTools(client).AsAIFunctions();
var prompt = $"You are editing documentId={doc.ItemId} on connectionId=workspace.\n\n"
           + OfficeAgentTools.SystemPromptGuidance;

AIAgent agent = new ChatClientAgent(
    chatClient,                       // any Microsoft.Extensions.AI IChatClient
    instructions: prompt,
    name:         "OfficeAgent",
    description:  "Edits Word documents using OfficeAgent.NET.",
    tools:        tools.Cast<AITool>().ToList(),
    services:     services);
Enter fullscreen mode Exit fullscreen mode

From here the agent drives the document itself: it inspects, finds anchors, builds a plan, previews it, and applies it through the tools. OfficeAgentTools.SystemPromptGuidance provides ready-made instructions that teach the model the inspect → plan → apply protocol. apply_plan saves the result and returns an output document id rather than sending .docx bytes back through the model; the host reads the result from storage and delivers the file through its own download or attachment API. A runnable Azure OpenAI example is in the samples/AgentEdit project.

Using OfficeAgent.NET as a .NET library

The same workflow can also be driven directly from C# code, without a model in the loop:

// service registration and document setup are the same as in the agent example

var client = services.GetRequiredService<OfficeAgentClient>();
var doc = await client.RegisterAsync(
    "workspace", "/srv/officeagent/workspace/contract.docx");

// inspect → find → preview → commit, addressing the document by its id
var inspect = await client.InspectAsync("workspace", doc.ItemId);
var hit     = (await client.FindAsync(
    "workspace", doc.ItemId, new FindQuery("Acme Corp"))).First();

var plan = new DocumentPlan
{
    Snapshot   = inspect.Snapshot,          // opt in to drift detection
    Operations = new PlanOperation[]
    {
        new ChangeTextOp { Target = hit.Anchor, With = "Globex Inc.", Mode = ChangeMode.Tracked }
    }
};

var preview = await client.PreviewAsync("workspace", doc.ItemId, plan);
if (!preview.IsValid) { /* surface preview.Errors */ return; }

var result = await client.CommitAsync("workspace", doc.ItemId, plan);
if (result.Committed)
{
    using var saved = await client.OpenReadAsync(result.Document);
    // saved.Stream holds the edited bytes; result.Document.ItemId is the new id.
}
Enter fullscreen mode Exit fullscreen mode

The tracked edit lands as a real Word revision, with insert and delete runs that show up in Word's review pane, not as flattened text. The samples/QuickEdit project in the repository contains a runnable version of this flow.

What 0.1 covers, and what it doesn't yet

Version 0.1 covers Word .docx documents and the full editing workflow: inspect, find, preview, and transactional commit, protected by content-verified anchors and optional snapshot-based drift detection. The release includes the 15 operations described above, the filesystem document provider together with the interface other file sources can implement, the agent tools for Microsoft Agent Framework, two runnable samples (QuickEdit and AgentEdit), and a documentation set.

This is a 0.1 preview release. Expect rough edges, and expect some APIs to change based on feedback.

Try it, and tell me where it breaks

dotnet add package OfficeAgent.Core --prerelease
dotnet add package OfficeAgent.Word --prerelease
dotnet add package OfficeAgent.AgentFramework --prerelease   # for the agent-tools path
Enter fullscreen mode Exit fullscreen mode

The source, samples, and documentation are on GitHub: https://github.com/ilia-sokolov/OfficeAgent.NET

This is early work, but the direction is clear: make real Word documents usable from agent workflows without requiring every agent developer to become an OOXML expert. If your agents need to produce real Office documents and this is (or isn't) the shape you would want, open an issue and tell me what's missing. That feedback is the whole point of shipping this early. And if the direction is useful to you, a star on GitHub helps other agent developers find it.

Top comments (0)