DEV Community

jelizaveta
jelizaveta

Posted on

Efficient Word to Excel Conversion in C#: Preserving Text, Tables, and Styles

In business office and document automation scenarios, Word and Excel are the two most commonly used document formats. Word is more suitable for writing explanatory text and complex layout content, while Excel excels in structured data processing, statistics, and analysis. In practical development, we often encounter the need to convert the content of a Word document (including paragraphs, tables, styles, and even images) into an Excel file for subsequent analysis or archiving .

This article will introduce how to use Spire.Doc for .NET and Spire.XLS for .NET to implement a comprehensive Word to Excel solution in C#, preserving as much of the original text styles, table structures, and image content as possible.


Installation of Required Libraries

PM> Install-Package Spire.Doc
PM> Install-Package Spire.XLS
Enter fullscreen mode Exit fullscreen mode

Overview of Implementation Ideas

The content structure of a Word document is relatively complex and mainly consists of the following object types:

  • Paragraph
  • Table
  • TextRange
  • Image (DocPicture)

The core structure of Excel, on the other hand, is:

  • Workbook
  • Worksheet
  • CellRange
  • RichText

Therefore, the basic idea for conversion is:

  1. Read the Word document;
  2. Traverse the Sections in the document;
  3. Process paragraphs and tables in order;
  4. Write paragraph content into Excel cells;
  5. Map Word tables to Excel row by row and column by column;
  6. Copy text styles, alignments, and images.

Loading Word and Creating Excel Workbook

The program first creates a Document object and loads the Word file, then creates a Workbook and clears the default worksheet to ensure a cleaner and more controlled output.

Document doc = new Document();
doc.LoadFromFile(@"C:\Users\Administrator\Desktop\Invoice.docx");

Workbook wb = new Workbook();
wb.Worksheets.Clear();
Worksheet worksheet = wb.CreateEmptySheet("WordToExcel");
Enter fullscreen mode Exit fullscreen mode

This approach ensures that the content from Word is uniformly written to a specified worksheet.


Traversing Word Document Content

A Word document may contain multiple Sections, each of which contains paragraphs and tables. The code reads these objects using a nested loop:

  • If it is a Paragraph, write it directly into a single Excel cell;
  • If it is a Table, call a dedicated method to export it to multiple rows and columns.
foreach (Section section in doc.Sections)
{
    foreach (DocumentObject documentObject in section.Body.ChildObjects)
    {
        if (documentObject is Paragraph)
        {
            // Write paragraph
        }
        if (documentObject is Table)
        {
            // Export table
        }
    }
}
Enter fullscreen mode Exit fullscreen mode

This maximizes the preservation of the original content order in Word.


Exporting Word Table to Excel

For Word tables, the program traverses TableRow row by row, then TableCell column by column, writing the content into the corresponding Excel cells. To make the Excel sheet clearer, borders are added to each cell:

cell.BorderAround(LineStyleType.Thin, Color.Black);
Enter fullscreen mode Exit fullscreen mode

Text, line breaks, and styles in the table will be fully copied to ensure data readability.


Copying Text Styles and Images

This is the core part of the entire conversion process. The program uses the RichText object to map different TextRange font names, sizes, colors, and bold states from Word to Excel:

  • Font Name
  • Font Size
  • Bold
  • Text Color

Additionally, if an image (DocPicture) is detected in a paragraph, it will be directly inserted into the corresponding Excel cell position, with the row height automatically adjusted according to the image height to prevent it from being obscured.


Alignment and Formatting Optimization

To further improve conversion quality, the code also handles paragraph alignment, mapping left, center, and right alignments from Word to Excel cell styles. After all content is written, the following actions are performed:

  • Automatically adjust row heights and column widths
  • Enable cell text wrapping

This step significantly enhances the overall readability of the generated Excel file.


Saving as Excel File

Finally, the generated workbook is saved as an Excel 2013 format file:

wb.SaveToFile("WordToExcel.xlsx", ExcelVersion.Version2013);
Enter fullscreen mode Exit fullscreen mode

By now, an Excel file containing paragraphs, tables, styles, and images has been successfully generated.

Complete Example Code (C# Word to Excel Conversion)

using Spire.Doc;
using Spire.Doc.Documents;
using Spire.Doc.Fields;
using Spire.Xls;
using System;
using System.Drawing;

namespace ConvertWordToExcel
{
    class Program
    {
        static void Main(string[] args)
        {
            // Create Document object
            Document doc = new Document();

            // Load Word document
            doc.LoadFromFile(@"C:\Users\Administrator\Desktop\Invoice.docx");

            // Create Workbook object
            Workbook wb = new Workbook();

            // Remove default worksheet
            wb.Worksheets.Clear();

            // Create a worksheet named "WordToExcel"
            Worksheet worksheet = wb.CreateEmptySheet("WordToExcel");

            int row = 1;
            int column = 1;

            // Traverse all Sections in the Word document
            foreach (Section section in doc.Sections)
            {
                // Traverse all document objects in the Section
                foreach (DocumentObject documentObject in section.Body.ChildObjects)
                {
                    // If the object is a paragraph
                    if (documentObject is Paragraph)
                    {
                        CellRange cell = worksheet.Range[row, column];
                        Paragraph paragraph = documentObject as Paragraph;

                        // Copy paragraph content and styles to Excel cell
                        CopyTextAndStyle(cell, paragraph);
                        row++;
                    }

                    // If the object is a table
                    if (documentObject is Table)
                    {
                        Table table = documentObject as Table;

                        // Export Word table to Excel
                        int currentRow = ExportTableInExcel(worksheet, row, table);
                        row = currentRow;
                    }
                }
            }

            // Automatically adjust row heights and column widths
            worksheet.AllocatedRange.AutoFitRows();
            worksheet.AllocatedRange.AutoFitColumns();

            // Set cells to wrap text automatically
            worksheet.AllocatedRange.IsWrapText = true;

            // Save as Excel file
            wb.SaveToFile("WordToExcel.xlsx", ExcelVersion.Version2013);
        }

        /// <summary>
        /// Export Word table data to Excel
        /// </summary>
        private static int ExportTableInExcel(Worksheet worksheet, int row, Table table)
        {
            CellRange cell;
            int column;

            // Iterate through each row in the table
            foreach (TableRow tbRow in table.Rows)
            {
                column = 1;

                // Iterate through each cell in the current row
                foreach (TableCell tbCell in tbRow.Cells)
                {
                    cell = worksheet.Range[row, column];

                    // Add border to Excel cell
                    cell.BorderAround(LineStyleType.Thin, Color.Black);

                    // Copy Word table cell content to Excel
                    CopyContentInTable(tbCell, cell);
                    column++;
                }
                row++;
            }
            return row;
        }

        /// <summary>
        /// Copy content in Word table cell to Excel cell
        /// </summary>
        private static void CopyContentInTable(TableCell tbCell, CellRange cell)
        {
            // Create a new paragraph object
            Paragraph newPara = new Paragraph(tbCell.Document);

            // Iterate through all child objects in the Word table cell
            for (int i = 0; i < tbCell.ChildObjects.Count; i++)
            {
                DocumentObject documentObject = tbCell.ChildObjects[i];

                if (documentObject is Paragraph)
                {
                    Paragraph paragraph = documentObject as Paragraph;

                    // Copy all child objects in the paragraph (text, images, etc.)
                    foreach (DocumentObject cObj in paragraph.ChildObjects)
                    {
                        newPara.ChildObjects.Add(cObj.Clone());
                    }

                    // If not the last paragraph, add a line break
                    if (i < tbCell.ChildObjects.Count - 1)
                    {
                        newPara.AppendText("\n");
                    }
                }
            }

            // Copy the merged paragraph content to Excel
            CopyTextAndStyle(cell, newPara);
        }

        /// <summary>
        /// Copy paragraph text content and style to Excel cell
        /// </summary>
        private static void CopyTextAndStyle(CellRange cell, Paragraph paragraph)
        {
            RichText richText = cell.RichText;

            // Set cell text content
            richText.Text = paragraph.Text;

            int startIndex = 0;

            // Iterate through child objects in the paragraph
            foreach (DocumentObject documentObject in paragraph.ChildObjects)
            {
                // If it is text content
                if (documentObject is TextRange)
                {
                    TextRange textRange = documentObject as TextRange;

                    // Get Word text style
                    string fontName = textRange.CharacterFormat.FontName;
                    bool isBold = textRange.CharacterFormat.Bold;
                    Color textColor = textRange.CharacterFormat.TextColor;
                    float fontSize = textRange.CharacterFormat.FontSize;
                    string textRangeText = textRange.Text;
                    int strLength = textRangeText.Length;

                    // Create Excel font
                    ExcelFont font = cell.Worksheet.Workbook.CreateFont();
                    font.Color = textColor;
                    font.IsBold = isBold;
                    font.Size = fontSize;
                    font.FontName = fontName;

                    // Apply font style to specified text range
                    int endIndex = startIndex + strLength;
                    richText.SetFont(startIndex, endIndex, font);
                    startIndex += strLength;
                }

                // If it is an image
                if (documentObject is DocPicture)
                {
                    DocPicture picture = documentObject as DocPicture;

                    // Insert image into Excel cell
                    cell.Worksheet.Pictures.Add(cell.Row, cell.Column, picture.Image);

                    // Adjust row height according to image height
                    cell.Worksheet.SetRowHeightInPixels(cell.Row, 1, picture.Image.Height);
                }
            }

            // Set Excel cell's horizontal alignment
            switch (paragraph.Format.HorizontalAlignment)
            {
                case HorizontalAlignment.Left:
                    cell.Style.HorizontalAlignment = HorizontalAlignType.Left;
                    break;
                case HorizontalAlignment.Center:
                    cell.Style.HorizontalAlignment = HorizontalAlignType.Center;
                    break;
                case HorizontalAlignment.Right:
                    cell.Style.HorizontalAlignment = HorizontalAlignType.Right;
                    break;
            }
        }
    }
}
Enter fullscreen mode Exit fullscreen mode

Conclusion

As demonstrated in this article, using Spire.Doc for .NET and Spire.XLS for .NET , we can efficiently implement Word to Excel conversion using C#, while during the conversion process:

  • Preserving text content and order
  • Restoring font styles and alignments
  • Fully exporting table structures
  • Supporting image copying

This solution is very suitable for invoice conversion, report organization, and document data structuring scenarios. If you are developing document automation or enterprise-level office systems, this Word to Excel implementation will have significant practical value.

Top comments (0)