Extract Tables from Word Documents in C#

#dotnet #csharp #programming

In daily office operations and software development, we often need to extract table data from Word documents for secondary processing—such as database imports, data analysis, and more. This article walks through using C# with the Spire.Doc library to extract tables from Word documents and save the extracted data as structured text files.

Tools and Environment Setup

To implement Word table extraction, you’ll need the following tools and components:

Development Environment: Visual Studio (or any C# IDE)
Framework: .NET Framework/.NET Core (compatible with mainstream versions)
Third-party Library: Spire.Doc (a robust library for parsing Word document structures and handling table data)

Spire.Doc simplifies reading, editing, and generating Word documents—with seamless support for tables, paragraphs, and other core elements. Install it via the NuGet Package Manager:

Right-click your project in Visual Studio
Select "Manage NuGet Packages"
Search for "Spire.Doc"
Click "Install" to add the library to your project

How to Extract Word Tables with C#

Implementation Approach
Extracting tables from Word relies on layer-by-layer document structure parsing, following this hierarchy:

Load the Word document and initialize a document object
Traverse all "Sections" (the basic structural units of a Word document)
For each Section, retrieve and iterate through its table collection
For each table, extract content cell-by-cell and row-by-row
Save the formatted extracted data to a text file

C# Code Example
Below is the full code for table extraction. We’ll break down its key logic afterward:

using Spire.Doc;
using Spire.Doc.Collections;
using Spire.Doc.Interface;
using System.IO;
using System.Text;

namespace ExtractWordTable
{
    internal class Program
    {
        static void Main(string[] args)
        {
            // Create a document object
            Document doc = new Document();
            // Load the Word document
            doc.LoadFromFile("Tables.docx");

            // Traverse all sections in the document
            for (int sectionIndex = 0; sectionIndex < doc.Sections.Count; sectionIndex++)
            {
                Section section = doc.Sections[sectionIndex];

                // Get all tables in the current section
                TableCollection tables = section.Tables;

                // Traverse all tables in the current section
                for (int tableIndex = 0; tableIndex < tables.Count; tableIndex++)
                {
                    ITable table = tables[tableIndex];

                    // Used to store all data of the current table
                    string tableData = "";

                    // Traverse all rows in the table
                    for (int rowIndex = 0; rowIndex < table.Rows.Count; rowIndex++)
                    {
                        TableRow row = table.Rows[rowIndex];
                        // Traverse all cells in the row
                        for (int cellIndex = 0; cellIndex < row.Cells.Count; cellIndex++)
                        {
                            TableCell cell = row.Cells[cellIndex];

                            // Extract cell text (a cell may contain multiple paragraphs)
                            string cellText = "";
                            for (int paraIndex = 0; paraIndex < cell.Paragraphs.Count; paraIndex++)
                            {
                                cellText += (cell.Paragraphs[paraIndex].Text.Trim() + " ");
                            }

                            // Splice cell text, separate different cells with tabs
                            tableData += cellText.Trim();
                            if (cellIndex < row.Cells.Count - 1)
                            {
                                tableData += "\t";
                            }
                        }

                        // Wrap line after the end of the row
                        tableData += "\n";
                    }

                    // Save table data to a text file
                    string filePath = Path.Combine("Tables", $"Section{sectionIndex + 1}_Table{tableIndex + 1}.txt");
                    File.WriteAllText(filePath, tableData, Encoding.UTF8);
                }
            }

            doc.Close();
        }
    }
}

Practical Extension Scenarios

Build on this foundation to extend functionality for real-world use cases:

Export to Excel: Integrate the Spire.XLS library to save data directly as Excel files (.xlsx)
Text Cleansing: Add logic to remove special characters, normalize line breaks, or convert data formats (e.g., dates, numbers)
Batch Processing: Extend the code to scan a folder and process all .docx files recursively
Database Import: Add ADO.NET or Entity Framework code to insert extracted data directly into SQL Server, MySQL, or other databases
Data Validation: Implement checks for missing values, duplicate entries, or format inconsistencies

Why This Approach Works

Compared to native Office Interop, Spire.Doc offers critical advantages:

No Office Dependency: Runs without requiring Microsoft Office to be installed on the server or development machine
Lightweight: Small footprint and fast performance for batch processing
Robust Parsing: Reliably handles complex Word structures (merged cells, nested tables, and formatted text)
Maintainable Code: Clear, modular logic that’s easy to debug and extend

This solution provides a production-ready foundation for extracting and processing Word table data—with minimal setup and maximum flexibility for customization.

DEV Community

Extract Tables from Word Documents in C#

Tools and Environment Setup

How to Extract Word Tables with C#

Practical Extension Scenarios

Why This Approach Works

Top comments (0)