DEV Community

YaHey
YaHey

Posted on

C#: Find and Replace Text in Word Documents

Managing Word documents programmatically often presents a significant challenge, especially when dynamic content updates are required. Imagine generating personalized reports, updating product catalogs, or standardizing legal documents where specific placeholders or outdated information need to be replaced efficiently and accurately. Manually performing these Content Update operations is not only time-consuming but also prone to errors. This article addresses this common pain point by demonstrating how to effectively find and replace text in Word documents using C# with the powerful Spire.Doc for .NET library. We'll explore practical solutions, from basic Text Search and replacement to more advanced techniques, providing a robust framework for your document automation needs.


Understanding the Need for Programmatic Text Replacement

Automated Find and Replace functionality is crucial in various professional scenarios. Consider applications like:

  • Template Generation: Automatically populating predefined templates with user-specific data (e.g., invoices, contracts, offer letters).
  • Data Merging: Integrating data from databases or external sources into structured Word documents.
  • Content Standardization: Ensuring consistency across multiple documents by replacing non-standard terms with approved terminology.
  • Bulk Updates: Modifying specific information (e.g., company addresses, dates, product names) across a large set of documents.

Attempting to achieve these tasks through direct string manipulation on raw document files is often insufficient. Word documents are complex structures, not plain text files. They contain rich formatting, embedded objects, and various document elements that simple string operations would corrupt. A specialized library like Spire.Doc for .NET understands this complexity, allowing developers to interact with document content at an appropriate level.


Getting Started with Spire.Doc for .NET

Spire.Doc for .NET is a comprehensive component that enables developers to create, read, write, convert, and print Word documents from any .NET application. It supports DOC, DOCX, RTF, HTML, XML, and other popular formats.

To begin, you'll need to install the Spire.Doc NuGet package in your C# project.

  • Dependency: Spire.Doc for .NET
  • Installation Method: NuGet Package Manager

    Install-Package Spire.Doc
    

Once installed, you can load a Word document as follows:

using Spire.Doc;
using Spire.Doc.Documents; // For DocumentObject, TextSelection, etc.
using System.Collections.Generic;
using System.Text.RegularExpressions; // For Regex functionality

public class DocumentReplacer
{
    public static void LoadAndProcessDocument(string filePath)
    {
        // Create a Document object
        Document document = new Document();

        // Load the Word document
        document.LoadFromFile(filePath);

        // Document is now loaded and ready for operations
        System.Console.WriteLine($"Document '{filePath}' loaded successfully.");
        // Further operations will go here
    }
}
Enter fullscreen mode Exit fullscreen mode

Implementing Basic Find and Replace

For straightforward Text Search and replacement, Spire.Doc for .NET provides intuitive methods. The Document.Replace() method is your primary tool for this.

Let's illustrate with an example of replacing a specific word throughout a document.

using Spire.Doc;
using Spire.Doc.Documents;
using System.Collections.Generic;
using System.Text.RegularExpressions;

public class DocumentReplacer
{
    public static void PerformBasicFindAndReplace(string inputFilePath, string outputFilePath)
    {
        Document document = new Document();
        document.LoadFromFile(inputFilePath);

        // Define the text to find and the replacement text
        string textToFind = "old_company_name";
        string replacementText = "NewTech Solutions Inc.";

        // Perform the replacement
        // Parameters:
        // 1. textToFind: The string to search for.
        // 2. replacementText: The string to replace with.
        // 3. caseSensitive: Set to true for case-sensitive search, false otherwise.
        // 4. wholeWord: Set to true to match whole words only, false otherwise.
        document.Replace(textToFind, replacementText, false, true); // Case-insensitive, whole word match

        // Save the modified document
        document.SaveToFile(outputFilePath, FileFormat.Docx);
        System.Console.WriteLine($"Basic find and replace completed. Document saved to '{outputFilePath}'.");
    }
}
Enter fullscreen mode Exit fullscreen mode

In this example, document.Replace() efficiently handles the Content Update. The false argument for caseSensitive ensures that "Old_Company_Name", "old_company_name", etc., are all replaced. The true argument for wholeWord prevents partial matches (e.g., "behold_company_name" would not be affected).


Advanced Find and Replace Techniques

Spire.Doc for .NET extends its capabilities beyond simple string matching, offering powerful options for more complex Text Search and Content Update scenarios, including regular expressions and formatted text replacement.

Using Regular Expressions for Pattern Matching

For more intricate patterns, Spire.Doc supports regular expressions. This is invaluable when dealing with dynamic placeholders (e.g., {{placeholder_name}}), dates, or other structured data.

using Spire.Doc;
using Spire.Doc.Documents;
using System.Collections.Generic;
using System.Text.RegularExpressions;

public class DocumentReplacer
{
    public static void PerformRegexFindAndReplace(string inputFilePath, string outputFilePath)
    {
        Document document = new Document();
        document.LoadFromFile(inputFilePath);

        // Example: Replace all dates in "MM/DD/YYYY" format with "YYYY-MM-DD"
        // Define a regular expression to find the pattern
        Regex regex = new Regex(@"(\d{2})/(\d{2})/(\d{4})");

        // Use the Replace method with a Regex object
        // The last parameter 'true' indicates that it's a regular expression search
        document.Replace(regex, "$3-$1-$2"); // Using capture groups for reordering

        // Save the modified document
        document.SaveToFile(outputFilePath, FileFormat.Docx);
        System.Console.WriteLine($"Regex find and replace completed. Document saved to '{outputFilePath}'.");
    }
}
Enter fullscreen mode Exit fullscreen mode

This example demonstrates Content Update using Text Search with regex, reformatting dates found in the document.

Replacing Text with Formatted Text

A significant value-add of Spire.Doc is its ability to replace text while preserving or applying new formatting. This is crucial for maintaining document aesthetics and structure.

using Spire.Doc;
using Spire.Doc.Documents;
using Spire.Doc.Fields; // For TextRange
using System.Drawing; // For Color

public class DocumentReplacer
{
    public static void ReplaceWithFormattedText(string inputFilePath, string outputFilePath)
    {
        Document document = new Document();
        document.LoadFromFile(inputFilePath);

        string textToFind = "Important Notice";
        string replacementText = "CRITICAL UPDATE";

        // Find all occurrences of the text
        TextSelection[] selections = document.FindAllString(textToFind, false, true);

        foreach (TextSelection selection in selections)
        {
            // Get the found text range
            TextRange range = selection.GetAsOneRange();

            // Clear the existing text
            range.Text = "";

            // Insert new formatted text
            TextRange newRange = new TextRange(document);
            newRange.Text = replacementText;
            newRange.CharacterFormat.Bold = true;
            newRange.CharacterFormat.TextColor = Color.Red;
            newRange.CharacterFormat.FontSize = 14;

            // Replace the original range with the new formatted range
            range.OwnerParagraph.ChildObjects.Remove(range);
            range.OwnerParagraph.ChildObjects.Insert(range.OwnerParagraph.ChildObjects.IndexOf(range), newRange);
        }

        // Save the modified document
        document.SaveToFile(outputFilePath, FileFormat.Docx);
        System.Console.WriteLine($"Formatted find and replace completed. Document saved to '{outputFilePath}'.");
    }
}
Enter fullscreen mode Exit fullscreen mode

This advanced technique involves finding the text, clearing its content, and then inserting a new TextRange with desired formatting (bold, red, larger font size). This granular control is essential for professional document generation.


Conclusion

The ability to programmatically find and replace text in Word documents using C# and Spire.Doc for .NET empowers developers to automate complex Content Update tasks with precision and efficiency. From basic Text Search and replacement to sophisticated regular expression matching and formatted text insertions, Spire.Doc provides a robust and flexible solution. This library simplifies working with the intricate structure of Word documents, freeing developers from manual interventions and enabling the creation of dynamic, data-driven document workflows. We encourage you to explore Spire.Doc for .NET further and discover its extensive capabilities for document generation, conversion, and manipulation, streamlining your document automation processes significantly.

Top comments (0)