DEV Community

Allen Yang
Allen Yang

Posted on

How to Access Built-in and Custom Word Properties in C#

Read Word Document Properties Using C#

In today's data-driven world, managing documents efficiently often goes beyond simply creating and editing content. For developers working with C# applications, programmatic access to the rich metadata embedded within Word documents is crucial for automation, intelligent data extraction, and ensuring compliance. Imagine needing to automatically categorize documents based on their author, track revisions, or integrate custom project information directly from a Word file. This is where understanding and leveraging Word document properties becomes indispensable.

While Microsoft Office Interop has historically been an option, it can be cumbersome, resource-intensive, and often requires Office to be installed on the server. Fortunately, powerful third-party libraries like Spire.Doc for .NET offer a robust and efficient alternative, enabling developers to interact with Word documents without these limitations.

This tutorial aims to provide a practical, hands-on guide for C# developers to read both built-in and custom Word document properties using Spire.Doc for .NET. By the end of this article, you will have a clear understanding of how to extract valuable metadata from your Word files programmatically, empowering your applications with enhanced document intelligence.


What Are Word Document Properties and Why Do They Matter?

Word document properties are essentially metadata associated with a Word file, providing crucial information about its content, origin, and management. They are broadly categorized into two types: Built-in Properties and Custom Properties. Understanding these categories is fundamental to effective document management and automation.

Built-in Properties

These are standard metadata fields automatically maintained by Microsoft Word. They offer a wealth of information that can be vital for document organization, search, and compliance. Common built-in properties include:

Property Name Description Data Type
Title The document's title. string
Author The primary author. string
Subject A brief description. string
Keywords Search terms. string
Comments General remarks. string
CreationDate Date and time of creation. DateTime
LastModifiedDate Date and time of last save. DateTime
WordCount Total words in the document. int
PageCount Total pages in the document. int

Custom Properties

Unlike built-in properties, custom properties are user-defined key-value pairs that allow users or applications to store additional, application-specific metadata within a Word document. This flexibility makes them incredibly powerful for extending the utility of Word files beyond their standard functionality.

For instance, you might use custom properties to store:

  • Project ID: An identifier linking the document to a specific project.
  • Approval Status: Tracking the workflow status (e.g., "Draft," "Approved," "Pending Review").
  • Version Number: Maintaining specific versioning beyond what a document management system might provide.
  • Client Name: Associating a document with a particular client.

The ability to programmatically read both built-in and custom properties is a cornerstone of building robust document processing solutions in C#. Libraries like Spire.Doc for .NET abstract away the complexities of the Word document format, providing a straightforward API to access this critical metadata, avoiding the overhead and dependencies of Microsoft Office Interop.


Getting Started: Integrating Spire.Doc for .NET

Before we dive into extracting document properties, you need to set up your C# project to use Spire.Doc for .NET. This library simplifies Word document manipulation significantly. The easiest way to integrate it is via NuGet Package Manager in Visual Studio.

Installation via NuGet

  1. Open your C# project in Visual Studio.
  2. Right-click on your project in the Solution Explorer.
  3. Select "Manage NuGet Packages...".
  4. In the "Browse" tab, search for Spire.Doc.
  5. Select Spire.Doc (published by e-iceblue) and click "Install".

Once installed, you can start using the library in your code. The first step in any operation involving a Word document is to load it into a Document object.

using Spire.Doc;
using System;
using System.Text; // For StringBuilder

namespace WordDocumentProperties
{
    class Program
    {
        static void Main(string[] args)
        {
            // Path to your Word document
            string filePath = "SampleDocument.docx"; 

            // Load the document
            Document doc = new Document();
            try
            {
                doc.LoadFromFile(filePath);
                Console.WriteLine($"Successfully loaded document: {filePath}");

                // Proceed to read properties...
                // (Code for reading built-in and custom properties will go here)
            }
            catch (Exception ex)
            {
                Console.WriteLine($"Error loading document: {ex.Message}");
            }
            finally
            {
                doc.Close(); // Always close the document after use
            }
            Console.ReadKey();
        }
    }
}
Enter fullscreen mode Exit fullscreen mode

This basic setup loads a Word document named SampleDocument.docx. Ensure this file exists in the executable's directory or provide a full path. The Document object is your gateway to accessing all aspects of the Word file, including its properties.


Accessing Standard Information: Built-in Document Properties

Spire.Doc for .NET provides a dedicated BuiltinDocumentProperties object within the Document class, which exposes all the standard metadata fields. Accessing these properties is straightforward, typically involving direct property calls.

Let's look at how to extract some common built-in properties:

using Spire.Doc;
using System;
using System.Text;

namespace WordDocumentProperties
{
    class Program
    {
        static void Main(string[] args)
        {
            string filePath = "SampleDocument.docx"; 
            Document doc = new Document();
            try
            {
                doc.LoadFromFile(filePath);
                Console.WriteLine($"Successfully loaded document: {filePath}\n");

                // Retrieve built-in document properties
                string title = doc.BuiltinDocumentProperties.Title;
                string author = doc.BuiltinDocumentProperties.Author;
                string subject = doc.BuiltinDocumentProperties.Subject;
                string keywords = doc.BuiltinDocumentProperties.Keywords;
                string comments = doc.BuiltinDocumentProperties.Comments;
                DateTime creationDate = doc.BuiltinDocumentProperties.CreationDate;
                DateTime lastModifiedDate = doc.BuiltinDocumentProperties.LastModifiedDate;
                int wordCount = doc.BuiltinDocumentProperties.WordCount;
                int pageCount = doc.BuiltinDocumentProperties.PageCount;

                Console.WriteLine("--- Built-in Document Properties ---");
                Console.WriteLine($"Title: {title ?? "N/A"}");
                Console.WriteLine($"Author: {author ?? "N/A"}");
                Console.WriteLine($"Subject: {subject ?? "N/A"}");
                Console.WriteLine($"Keywords: {keywords ?? "N/A"}");
                Console.WriteLine($"Comments: {comments ?? "N/A"}");
                Console.WriteLine($"Creation Date: {creationDate.ToLocalTime()}");
                Console.WriteLine($"Last Modified Date: {lastModifiedDate.ToLocalTime()}");
                Console.WriteLine($"Word Count: {wordCount}");
                Console.WriteLine($"Page Count: {pageCount}");
            }
            catch (Exception ex)
            {
                Console.WriteLine($"Error processing document: {ex.Message}");
            }
            finally
            {
                doc.Close(); 
            }
            Console.ReadKey();
        }
    }
}
Enter fullscreen mode Exit fullscreen mode

In this example, we directly access properties like doc.BuiltinDocumentProperties.Author or doc.BuiltinDocumentProperties.CreationDate. Notice the use of the null-coalescing operator (?? "N/A") for string properties. This is a good practice to handle cases where a built-in property might not be set in the document, preventing NullReferenceExceptions. Date properties are typically returned as DateTime objects, while counts are int.

  • Title:The document's title. (string)
  • Author:The primary author. (string)
  • Subject:A brief description. (string)
  • Keywords:Search terms. (string)
  • Comments:General remarks. (string)
  • CreationDate:Date and time of creation. (DateTime)
  • LastModifiedDate:Date and time of last save. (DateTime)
  • WordCount:Total words in the document. (int)
  • PageCount:Total pages in the document. (int)

This table highlights some of the most commonly used built-in properties. Spire.Doc provides access to the full range of these properties, allowing you to extract comprehensive document information with minimal code.


Beyond the Defaults: Working with Custom Document Properties

Custom document properties offer unparalleled flexibility for embedding domain-specific metadata directly into your Word files. Spire.Doc for .NET makes it just as easy to read these user-defined properties as it does the built-in ones. Custom properties are stored as a collection, accessible via doc.CustomDocumentProperties.

Each custom property has a name and a value, and importantly, a specific data type (e.g., string, number, date, boolean). When retrieving these values, it's crucial to handle their types appropriately.

Here's how you can iterate through and extract custom properties:

using Spire.Doc;
using System;
using System.Text;

namespace WordDocumentProperties
{
    class Program
    {
        static void Main(string[] args)
        {
            string filePath = "SampleDocumentWithCustomProps.docx"; 
            Document doc = new Document();
            try
            {
                doc.LoadFromFile(filePath);
                Console.WriteLine($"Successfully loaded document: {filePath}\n");

                Console.WriteLine("--- Custom Document Properties ---");

                // Iterate through all custom document properties
                if (doc.CustomDocumentProperties.Count > 0)
                {
                    foreach (DocumentProperty customProp in doc.CustomDocumentProperties)
                    {
                        Console.WriteLine($"Name: {customProp.Name}, Value: {customProp.Value}, Type: {customProp.PropertyType}");
                    }

                    // Access a specific custom property by name
                    // Assuming a custom property named "ProjectID" exists
                    if (doc.CustomDocumentProperties.Contains("ProjectID"))
                    {
                        object projectIDValue = doc.CustomDocumentProperties["ProjectID"].Value;
                        Console.WriteLine($"\nSpecific Custom Property 'ProjectID': {projectIDValue ?? "N/A"}");
                    }
                    else
                    {
                        Console.WriteLine("\nCustom property 'ProjectID' not found.");
                    }

                    // Access a specific custom property with type casting
                    // Assuming a custom property named "IsApproved" of type Boolean
                    if (doc.CustomDocumentProperties.Contains("IsApproved"))
                    {
                        bool isApproved = (bool)doc.CustomDocumentProperties["IsApproved"].Value;
                        Console.WriteLine($"Specific Custom Property 'IsApproved': {isApproved}");
                    }
                    else
                    {
                        Console.WriteLine("Custom property 'IsApproved' not found.");
                    }
                }
                else
                {
                    Console.WriteLine("No custom document properties found in this document.");
                }
            }
            catch (Exception ex)
            {
                Console.WriteLine($"Error processing document: {ex.Message}");
            }
            finally
            {
                doc.Close(); 
            }
            Console.ReadKey();
        }
    }
}
Enter fullscreen mode Exit fullscreen mode

In this example, we first check if any custom properties exist. Then, we iterate through the CustomDocumentProperties collection, where each item is a DocumentProperty object. This object exposes the Name, Value, and PropertyType of the custom property.

We also demonstrate how to access a specific custom property by its name using doc.CustomDocumentProperties["PropertyName"]. It's crucial to use Contains("PropertyName") to check for existence before attempting to access to prevent errors if the property isn't present. When retrieving the Value, it's returned as an object, so you might need to cast it to its expected type (e.g., (bool)customProp.Value) for specific operations.

This capability to define and retrieve custom metadata significantly enhances the programmatic control you have over your Word documents, allowing for deeper integration with business logic and data management systems.


Conclusion

Programmatically reading Word document properties in C# is a powerful technique for enhancing document automation, data extraction, and metadata management within your applications. This tutorial has demonstrated how Spire.Doc for .NET provides an efficient and straightforward solution for this task, bypassing the complexities often associated with traditional Office Interop.

We've explored both the built-in properties, which offer standard document information, and the highly flexible custom properties, allowing you to embed application-specific metadata. By leveraging Spire.Doc for .NET, you can seamlessly integrate these capabilities into your C# projects, whether for indexing, reporting, content management, or specialized workflows.

The ability to accurately and reliably extract this embedded information empowers developers to build more intelligent and responsive applications. We encourage you to explore the full range of features offered by Spire.Doc for .NET and consider how programmatic access to document properties can streamline your document processing tasks and unlock new possibilities for data utilization within your C# ecosystem. Start integrating this robust functionality today and elevate your document handling capabilities.

Top comments (0)