DEV Community

Jeremy K.
Jeremy K.

Posted on

How to Read Hyperlinks from Word in C# (Free .NET Library)

Are you looking for a lightweight way to extract hyperlinks from Word documents in your .NET application? Whether you are performing a large-scale content audit, migrating website data, or automating document processing workflows, programmatically reading hyperlinks is a common developer requirement.

This tutorial provides a complete, low-code solution using the Free Spire.Doc for .NET component. We will cover how to read URL addresses, anchor text (display text), and even screen tips from .doc and .docx files—without requiring Microsoft Office to be installed on your machine.


Understanding Word Hyperlink Fields (Technical Background)

Before diving into the code, it helps to understand how Word stores links. In Microsoft Word, hyperlinks are not plain text; they are stored as Field objects within the document structure.

Each hyperlink field consists of two critical components:

Component Description
Anchor Text (Display Text) The visible, clickable text the user sees in the document.
URL / Link Address The hidden target destination stored in the field code.

By iterating through the document's hierarchical object model (DocumentSectionsParagraphsFields), we can capture these elements precisely.


Setting Up Your C# Environment

To get started, you need to install the required library and include the necessary namespaces.

1. Install the Free Spire.Doc Package

The easiest way to integrate the library is via the NuGet Package Manager. Run the following command in the Package Manager Console:

Install-Package FreeSpire.Doc
Enter fullscreen mode Exit fullscreen mode

⚠️ Note: This free edition has page restrictions, making it ideal for small to medium-sized document processing tasks.

2. Import Required Namespaces

Add the following using statements at the top of your code file to access the document parsing methods:

using Spire.Doc;
using Spire.Doc.Documents;
using Spire.Doc.Fields;
using System.Collections.Generic;
using System.IO;
using System.Text;
Enter fullscreen mode Exit fullscreen mode

Step-by-Step: C# Code to Extract Hyperlinks

Let's break down the core extraction logic into manageable steps. We will traverse the document, filter for hyperlink fields, and output the results.

Step 1: Load the Word Document

Create a Document object and load your target file (supports both .doc and .docx).

Step 2: Traverse Sections and Paragraphs

Since Word uses a nested structure, we must iterate through all Sections and their Paragraphs.

Step 3: Filter Hyperlink Fields

Within each paragraph, check if a child object is of type Field. If so, verify that its Type equals FieldType.FieldHyperlink.

Step 4: Extract and Export Data

For every identified hyperlink, read the FieldText (Anchor Text) and Code (URL) properties, then save them to a text file.


Full Source Code Example

Here is the complete, ready-to-run C# program for extracting hyperlinks:

using Spire.Doc;
using Spire.Doc.Documents;
using Spire.Doc.Fields;
using System.Collections.Generic;
using System.IO;
using System.Text;

namespace ExtractHyperlinks
{
    class Program
    {
        static void Main(string[] args)
        {
            // 1. Initialize and load the Word document
            Document doc = new Document();
            doc.LoadFromFile(@"sample.docx");

            // 2. Storage list for hyperlink fields
            List<Field> hyperlinks = new List<Field>();

            // 3. Traverse the document hierarchy
            foreach (Section section in doc.Sections)
            {
                foreach (DocumentObject secObj in section.Body.ChildObjects)
                {
                    if (secObj.DocumentObjectType == DocumentObjectType.Paragraph)
                    {
                        Paragraph paragraph = secObj as Paragraph;

                        foreach (DocumentObject paraObj in paragraph.ChildObjects)
                        {
                            // Identify Field objects
                            if (paraObj.DocumentObjectType == DocumentObjectType.Field)
                            {
                                Field field = paraObj as Field;

                                // Filter specifically for hyperlinks
                                if (field.Type == FieldType.FieldHyperlink)
                                {
                                    hyperlinks.Add(field);
                                }
                            }
                        }
                    }
                }
            }

            // 4. Build and write the output
            StringBuilder sb = new StringBuilder();
            foreach (Field hyperlink in hyperlinks)
            {
                sb.AppendLine("Display Text: " + hyperlink.FieldText);
                sb.AppendLine("URL Address: " + hyperlink.Code);
                sb.AppendLine();
            }

            File.WriteAllText("Hyperlinks.txt", sb.ToString());
            doc.Close();

        }
    }
}
Enter fullscreen mode Exit fullscreen mode

Going Further: Extracting Screen Tips

Many hyperlinks contain a ScreenTip—the descriptive text that appears when a user hovers over the link. While the basic extraction above gets the URL and text, you can parse the Field Code to capture the tip.

The field code typically looks like this:

  • Standard: HYPERLINK "https://example.com"
  • With ScreenTip: HYPERLINK "https://example.com" \o "Visit our homepage"

Add this logic to extract the hover text:

// Get the display text
string anchorText = field.FieldText;

// Get the raw field code string
string fieldCode = field.GetFieldCode();

// Extract the URL (between the first pair of quotes)
string url = fieldCode.Split('"')[1];

// Extract the screen tip if the \o switch exists
string screenTip = string.Empty;
if (fieldCode.Contains("\\o"))
{
    // The screen tip is usually in the third pair of quotes
    screenTip = fieldCode.Split('"')[3].Trim();
}
Enter fullscreen mode Exit fullscreen mode

Summary

Extracting hyperlinks from Word documents in C# is a straightforward process once you understand the Field structure. By using the Free Spire.Doc for .NET library, you can reliably read display text, URLs, and screen tips with just a few dozen lines of code.

Key Benefits of this Approach:

  • ✅ No Microsoft Office interop required.
  • ✅ Ideal for server-side and automated batch scripts.
  • ✅ Easy to extend—you can update URLs by modifying the Code property.

Try this solution today to supercharge your document data extraction workflows. If you need to process hyperlinks in bulk, simply loop through multiple files and aggregate the results into a database or JSON file.

Top comments (0)