Are you looking for a lightweight way to extract hyperlinks from Word documents in your .NET application? Whether you are performing a large-scale content audit, migrating website data, or automating document processing workflows, programmatically reading hyperlinks is a common developer requirement.
This tutorial provides a complete, low-code solution using the Free Spire.Doc for .NET component. We will cover how to read URL addresses, anchor text (display text), and even screen tips from .doc and .docx files—without requiring Microsoft Office to be installed on your machine.
Understanding Word Hyperlink Fields (Technical Background)
Before diving into the code, it helps to understand how Word stores links. In Microsoft Word, hyperlinks are not plain text; they are stored as Field objects within the document structure.
Each hyperlink field consists of two critical components:
| Component | Description |
|---|---|
| Anchor Text (Display Text) | The visible, clickable text the user sees in the document. |
| URL / Link Address | The hidden target destination stored in the field code. |
By iterating through the document's hierarchical object model (Document → Sections → Paragraphs → Fields), we can capture these elements precisely.
Setting Up Your C# Environment
To get started, you need to install the required library and include the necessary namespaces.
1. Install the Free Spire.Doc Package
The easiest way to integrate the library is via the NuGet Package Manager. Run the following command in the Package Manager Console:
Install-Package FreeSpire.Doc
⚠️ Note: This free edition has page restrictions, making it ideal for small to medium-sized document processing tasks.
2. Import Required Namespaces
Add the following using statements at the top of your code file to access the document parsing methods:
using Spire.Doc;
using Spire.Doc.Documents;
using Spire.Doc.Fields;
using System.Collections.Generic;
using System.IO;
using System.Text;
Step-by-Step: C# Code to Extract Hyperlinks
Let's break down the core extraction logic into manageable steps. We will traverse the document, filter for hyperlink fields, and output the results.
Step 1: Load the Word Document
Create a Document object and load your target file (supports both .doc and .docx).
Step 2: Traverse Sections and Paragraphs
Since Word uses a nested structure, we must iterate through all Sections and their Paragraphs.
Step 3: Filter Hyperlink Fields
Within each paragraph, check if a child object is of type Field. If so, verify that its Type equals FieldType.FieldHyperlink.
Step 4: Extract and Export Data
For every identified hyperlink, read the FieldText (Anchor Text) and Code (URL) properties, then save them to a text file.
Full Source Code Example
Here is the complete, ready-to-run C# program for extracting hyperlinks:
using Spire.Doc;
using Spire.Doc.Documents;
using Spire.Doc.Fields;
using System.Collections.Generic;
using System.IO;
using System.Text;
namespace ExtractHyperlinks
{
class Program
{
static void Main(string[] args)
{
// 1. Initialize and load the Word document
Document doc = new Document();
doc.LoadFromFile(@"sample.docx");
// 2. Storage list for hyperlink fields
List<Field> hyperlinks = new List<Field>();
// 3. Traverse the document hierarchy
foreach (Section section in doc.Sections)
{
foreach (DocumentObject secObj in section.Body.ChildObjects)
{
if (secObj.DocumentObjectType == DocumentObjectType.Paragraph)
{
Paragraph paragraph = secObj as Paragraph;
foreach (DocumentObject paraObj in paragraph.ChildObjects)
{
// Identify Field objects
if (paraObj.DocumentObjectType == DocumentObjectType.Field)
{
Field field = paraObj as Field;
// Filter specifically for hyperlinks
if (field.Type == FieldType.FieldHyperlink)
{
hyperlinks.Add(field);
}
}
}
}
}
}
// 4. Build and write the output
StringBuilder sb = new StringBuilder();
foreach (Field hyperlink in hyperlinks)
{
sb.AppendLine("Display Text: " + hyperlink.FieldText);
sb.AppendLine("URL Address: " + hyperlink.Code);
sb.AppendLine();
}
File.WriteAllText("Hyperlinks.txt", sb.ToString());
doc.Close();
}
}
}
Going Further: Extracting Screen Tips
Many hyperlinks contain a ScreenTip—the descriptive text that appears when a user hovers over the link. While the basic extraction above gets the URL and text, you can parse the Field Code to capture the tip.
The field code typically looks like this:
-
Standard:
HYPERLINK "https://example.com" -
With ScreenTip:
HYPERLINK "https://example.com" \o "Visit our homepage"
Add this logic to extract the hover text:
// Get the display text
string anchorText = field.FieldText;
// Get the raw field code string
string fieldCode = field.GetFieldCode();
// Extract the URL (between the first pair of quotes)
string url = fieldCode.Split('"')[1];
// Extract the screen tip if the \o switch exists
string screenTip = string.Empty;
if (fieldCode.Contains("\\o"))
{
// The screen tip is usually in the third pair of quotes
screenTip = fieldCode.Split('"')[3].Trim();
}
Summary
Extracting hyperlinks from Word documents in C# is a straightforward process once you understand the Field structure. By using the Free Spire.Doc for .NET library, you can reliably read display text, URLs, and screen tips with just a few dozen lines of code.
Key Benefits of this Approach:
- ✅ No Microsoft Office interop required.
- ✅ Ideal for server-side and automated batch scripts.
- ✅ Easy to extend—you can update URLs by modifying the
Codeproperty.
Try this solution today to supercharge your document data extraction workflows. If you need to process hyperlinks in bulk, simply loop through multiple files and aggregate the results into a database or JSON file.
Top comments (0)