In office automation, document auditing, and data extraction workflows, developers often need to locate specific keywords or complex patterns within large batches of Word documents and highlight them for quick review.
While the traditional Microsoft Office Interop approach can achieve this, it requires Microsoft Word to be installed on the server. This dependency often leads to stability issues, permission errors, and performance bottlenecks in headless or server-side environments.
A more robust alternative is using Free Spire.Doc for .NET, a standalone library that allows you to read, write, and manipulate Word documents without needing Microsoft Office installed. In this tutorial, we will walk through two practical examples demonstrating how to:
Find and highlight exact strings (e.g., specific terms).
Find and highlight complex patterns using Regular Expressions (Regex).
Prerequisites
To get started, add the Free Spire.Doc package to your .NET project via the NuGet Package Manager:
Install-Package FreeSpire.Doc
Alternatively, you can download the DLLs directly from the official website and add them as references manually.
Example 1: Finding and Highlighting Exact Strings
Let’s say you have a literary analysis document (input.docx) and need to locate every instance of the term "transcendentalism" and highlight it in yellow.
Implementation Logic
Load the target Word document.
Use the
FindAllStringmethod to retrieve all occurrences.Iterate through the results and apply a yellow highlight color.
Save the modified document.
Complete Code Example
using System;
using System.Drawing;
using Spire.Doc;
using Spire.Doc.Documents;
namespace FindHighlightSimple
{
class Program
{
static void Main(string[] args)
{
// 1. Initialize a new Document instance
Document document = new Document();
// 2. Load the source Word document
// Ensure "input.docx" exists in your execution directory
document.LoadFromFile("input.docx");
Console.WriteLine("Searching for 'transcendentalism'...");
// 3. Find all occurrences of the string
// Parameters:
// "transcendentalism" -> The search term
// false -> Case-insensitive search
// true -> Match whole words only (prevents matching inside other words)
TextSelection[] matches = document.FindAllString("transcendentalism", false, true);
Console.WriteLine($"Found {matches.Length} matches.");
// 4. Apply yellow highlighting to each match
foreach (TextSelection selection in matches)
{
// GetAsOneRange() ensures the selection is treated as a single continuous range
// even if it spans multiple formatting blocks.
selection.GetAsOneRange().CharacterFormat.HighlightColor = Color.Yellow;
}
// 5. Save the output file
string outputPath = "HighlightResult.docx";
document.SaveToFile(outputPath, FileFormat.Docx);
Console.WriteLine($"Success! Saved to: {outputPath}");
}
}
}
Explanation
FindAllString: The most efficient method for exact text matching. It returns aTextSelectionarray containing the coordinates of every match.GetAsOneRange(): Crucial for consistent formatting. If a found word is split across different internal XML nodes, this method merges them into a single range object so styles apply uniformly.CharacterFormat.HighlightColor: Leverages standard .NETSystem.Drawing.Colorvalues (e.g.,Color.Yellow,Color.Red) to apply Word’s native highlighting.
Example 2: Using Regular Expressions for Pattern Matching
Real-world scenarios often require finding dynamic patterns rather than static text. For instance, you might need to identify all template placeholders like [Name], [Date], or [ID_123] to verify they have been filled.
Free Spire.Doc supports Regular Expressions (Regex) via the FindAllPattern method, making it easy to target complex structures.
Scenario
We want to find all placeholders formatted as [Word] (e.g., [Username], [Address]) and highlight them in light green with bold text.
Implementation Logic
Define a Regex pattern to match brackets and alphanumeric content.
Execute
FindAllPatternagainst the document.Loop through matches to apply green highlighting and bold styling.
Complete Code Example
using System;
using System.Drawing;
using System.Text.RegularExpressions;
using Spire.Doc;
using Spire.Doc.Documents;
namespace FindHighlightRegex
{
class Program
{
static void Main(string[] args)
{
// 1. Initialize Document
Document document = new Document();
// 2. Load the document containing placeholders
document.LoadFromFile("Template.docx");
Console.WriteLine("Scanning for placeholders using Regex...");
// 3. Define the Regex pattern
// $$ : Escaped left bracket
// \w+ : One or more word characters (letters, digits, underscore)
// $$ : Escaped right bracket
Regex pattern = new Regex(@"$$\w+$$", RegexOptions.IgnoreCase);
// 4. Find all matches based on the pattern
TextSelection[] selections = document.FindAllPattern(pattern);
Console.WriteLine($"Found {selections.Length} placeholders.");
// 5. Apply formatting to each match
foreach (TextSelection selection in selections)
{
var range = selection.GetAsOneRange();
// Log the found text (optional)
Console.WriteLine($" - Detected: {range.Text}");
// Apply Light Green highlight
range.CharacterFormat.HighlightColor = Color.LightGreen;
// Optional: Make the text bold for extra visibility
range.CharacterFormat.Bold = true;
}
// 6. Save the result
string outputPath = "RegexHighlightResult.docx";
document.SaveToFile(outputPath, FileFormat.Docx);
Console.WriteLine($"\nDone! Output saved to: {outputPath}");
}
}
}
Understanding the Regex
-
@"$$\w+$$":- The
@symbol creates a verbatim string, simplifying backslash escaping. -
$$and$$explicitly match the literal bracket characters. -
\w+captures any sequence of letters, numbers, or underscores inside the brackets.
- The
RegexOptions.IgnoreCase: Ensures the search is case-insensitive, catching[NAME],[name], and[Name]equally.
Extending the Pattern
You can adapt the Regex for various data extraction tasks:
Dates:
@\d{4}-\d{2}-\d{2}(Matches2023-10-01)Emails:
@\w+@\w+\.\w+(Basic email matcher)Variables:
@\$[A-Za-z]+(Matches variables like$Variable)
Best Practices & Considerations
1. Required Namespaces
Ensure you include the following at the top of your file:
using Spire.Doc;
using Spire.Doc.Documents;
using System.Drawing; // For Color definitions
using System.Text.RegularExpressions; // For Regex logic
2. Performance Tips
FindAllString and FindAllPattern are optimized for speed and handle documents with hundreds of pages efficiently. However, if you are performing heavy additional processing inside the foreach loop (e.g., network calls or database writes), consider batching operations or profiling memory usage first.
Conclusion
Automating text highlighting in Word documents doesn't require Microsoft Office installation. By leveraging C# and Free Spire.Doc for .NET, you can build robust, server-friendly solutions for:
Exact Keyword Search: Ideal for spell-checking, compliance auditing, or terminology enforcement.
Regex Pattern Matching: Perfect for validating templates, extracting data fields, or flagging dynamic content.
This approach offers a lightweight, stable alternative to Interop, making it perfect for ASP.NET Core backends, Azure Functions, and Windows desktop utilities.
Have questions about implementing Word automation in your project? Leave a comment below!
Top comments (0)