In business office and document automation scenarios, Word and Excel are the two most commonly used document formats. Word is more suitable for writing explanatory text and complex layout content, while Excel excels in structured data processing, statistics, and analysis. In practical development, we often encounter the need to convert the content of a Word document (including paragraphs, tables, styles, and even images) into an Excel file for subsequent analysis or archiving .
This article will introduce how to use Spire.Doc for .NET and Spire.XLS for .NET to implement a comprehensive Word to Excel solution in C#, preserving as much of the original text styles, table structures, and image content as possible.
Installation of Required Libraries
PM> Install-Package Spire.Doc
PM> Install-Package Spire.XLS
Overview of Implementation Ideas
The content structure of a Word document is relatively complex and mainly consists of the following object types:
- Paragraph
- Table
- TextRange
- Image (DocPicture)
The core structure of Excel, on the other hand, is:
- Workbook
- Worksheet
- CellRange
- RichText
Therefore, the basic idea for conversion is:
- Read the Word document;
- Traverse the Sections in the document;
- Process paragraphs and tables in order;
- Write paragraph content into Excel cells;
- Map Word tables to Excel row by row and column by column;
- Copy text styles, alignments, and images.
Loading Word and Creating Excel Workbook
The program first creates a Document object and loads the Word file, then creates a Workbook and clears the default worksheet to ensure a cleaner and more controlled output.
Document doc = new Document();
doc.LoadFromFile(@"C:\Users\Administrator\Desktop\Invoice.docx");
Workbook wb = new Workbook();
wb.Worksheets.Clear();
Worksheet worksheet = wb.CreateEmptySheet("WordToExcel");
This approach ensures that the content from Word is uniformly written to a specified worksheet.
Traversing Word Document Content
A Word document may contain multiple Sections, each of which contains paragraphs and tables. The code reads these objects using a nested loop:
- If it is a Paragraph, write it directly into a single Excel cell;
- If it is a Table, call a dedicated method to export it to multiple rows and columns.
foreach (Section section in doc.Sections)
{
foreach (DocumentObject documentObject in section.Body.ChildObjects)
{
if (documentObject is Paragraph)
{
// Write paragraph
}
if (documentObject is Table)
{
// Export table
}
}
}
This maximizes the preservation of the original content order in Word.
Exporting Word Table to Excel
For Word tables, the program traverses TableRow row by row, then TableCell column by column, writing the content into the corresponding Excel cells. To make the Excel sheet clearer, borders are added to each cell:
cell.BorderAround(LineStyleType.Thin, Color.Black);
Text, line breaks, and styles in the table will be fully copied to ensure data readability.
Copying Text Styles and Images
This is the core part of the entire conversion process. The program uses the RichText object to map different TextRange font names, sizes, colors, and bold states from Word to Excel:
- Font Name
- Font Size
- Bold
- Text Color
Additionally, if an image (DocPicture) is detected in a paragraph, it will be directly inserted into the corresponding Excel cell position, with the row height automatically adjusted according to the image height to prevent it from being obscured.
Alignment and Formatting Optimization
To further improve conversion quality, the code also handles paragraph alignment, mapping left, center, and right alignments from Word to Excel cell styles. After all content is written, the following actions are performed:
- Automatically adjust row heights and column widths
- Enable cell text wrapping
This step significantly enhances the overall readability of the generated Excel file.
Saving as Excel File
Finally, the generated workbook is saved as an Excel 2013 format file:
wb.SaveToFile("WordToExcel.xlsx", ExcelVersion.Version2013);
By now, an Excel file containing paragraphs, tables, styles, and images has been successfully generated.
Complete Example Code (C# Word to Excel Conversion)
using Spire.Doc;
using Spire.Doc.Documents;
using Spire.Doc.Fields;
using Spire.Xls;
using System;
using System.Drawing;
namespace ConvertWordToExcel
{
class Program
{
static void Main(string[] args)
{
// Create Document object
Document doc = new Document();
// Load Word document
doc.LoadFromFile(@"C:\Users\Administrator\Desktop\Invoice.docx");
// Create Workbook object
Workbook wb = new Workbook();
// Remove default worksheet
wb.Worksheets.Clear();
// Create a worksheet named "WordToExcel"
Worksheet worksheet = wb.CreateEmptySheet("WordToExcel");
int row = 1;
int column = 1;
// Traverse all Sections in the Word document
foreach (Section section in doc.Sections)
{
// Traverse all document objects in the Section
foreach (DocumentObject documentObject in section.Body.ChildObjects)
{
// If the object is a paragraph
if (documentObject is Paragraph)
{
CellRange cell = worksheet.Range[row, column];
Paragraph paragraph = documentObject as Paragraph;
// Copy paragraph content and styles to Excel cell
CopyTextAndStyle(cell, paragraph);
row++;
}
// If the object is a table
if (documentObject is Table)
{
Table table = documentObject as Table;
// Export Word table to Excel
int currentRow = ExportTableInExcel(worksheet, row, table);
row = currentRow;
}
}
}
// Automatically adjust row heights and column widths
worksheet.AllocatedRange.AutoFitRows();
worksheet.AllocatedRange.AutoFitColumns();
// Set cells to wrap text automatically
worksheet.AllocatedRange.IsWrapText = true;
// Save as Excel file
wb.SaveToFile("WordToExcel.xlsx", ExcelVersion.Version2013);
}
/// <summary>
/// Export Word table data to Excel
/// </summary>
private static int ExportTableInExcel(Worksheet worksheet, int row, Table table)
{
CellRange cell;
int column;
// Iterate through each row in the table
foreach (TableRow tbRow in table.Rows)
{
column = 1;
// Iterate through each cell in the current row
foreach (TableCell tbCell in tbRow.Cells)
{
cell = worksheet.Range[row, column];
// Add border to Excel cell
cell.BorderAround(LineStyleType.Thin, Color.Black);
// Copy Word table cell content to Excel
CopyContentInTable(tbCell, cell);
column++;
}
row++;
}
return row;
}
/// <summary>
/// Copy content in Word table cell to Excel cell
/// </summary>
private static void CopyContentInTable(TableCell tbCell, CellRange cell)
{
// Create a new paragraph object
Paragraph newPara = new Paragraph(tbCell.Document);
// Iterate through all child objects in the Word table cell
for (int i = 0; i < tbCell.ChildObjects.Count; i++)
{
DocumentObject documentObject = tbCell.ChildObjects[i];
if (documentObject is Paragraph)
{
Paragraph paragraph = documentObject as Paragraph;
// Copy all child objects in the paragraph (text, images, etc.)
foreach (DocumentObject cObj in paragraph.ChildObjects)
{
newPara.ChildObjects.Add(cObj.Clone());
}
// If not the last paragraph, add a line break
if (i < tbCell.ChildObjects.Count - 1)
{
newPara.AppendText("\n");
}
}
}
// Copy the merged paragraph content to Excel
CopyTextAndStyle(cell, newPara);
}
/// <summary>
/// Copy paragraph text content and style to Excel cell
/// </summary>
private static void CopyTextAndStyle(CellRange cell, Paragraph paragraph)
{
RichText richText = cell.RichText;
// Set cell text content
richText.Text = paragraph.Text;
int startIndex = 0;
// Iterate through child objects in the paragraph
foreach (DocumentObject documentObject in paragraph.ChildObjects)
{
// If it is text content
if (documentObject is TextRange)
{
TextRange textRange = documentObject as TextRange;
// Get Word text style
string fontName = textRange.CharacterFormat.FontName;
bool isBold = textRange.CharacterFormat.Bold;
Color textColor = textRange.CharacterFormat.TextColor;
float fontSize = textRange.CharacterFormat.FontSize;
string textRangeText = textRange.Text;
int strLength = textRangeText.Length;
// Create Excel font
ExcelFont font = cell.Worksheet.Workbook.CreateFont();
font.Color = textColor;
font.IsBold = isBold;
font.Size = fontSize;
font.FontName = fontName;
// Apply font style to specified text range
int endIndex = startIndex + strLength;
richText.SetFont(startIndex, endIndex, font);
startIndex += strLength;
}
// If it is an image
if (documentObject is DocPicture)
{
DocPicture picture = documentObject as DocPicture;
// Insert image into Excel cell
cell.Worksheet.Pictures.Add(cell.Row, cell.Column, picture.Image);
// Adjust row height according to image height
cell.Worksheet.SetRowHeightInPixels(cell.Row, 1, picture.Image.Height);
}
}
// Set Excel cell's horizontal alignment
switch (paragraph.Format.HorizontalAlignment)
{
case HorizontalAlignment.Left:
cell.Style.HorizontalAlignment = HorizontalAlignType.Left;
break;
case HorizontalAlignment.Center:
cell.Style.HorizontalAlignment = HorizontalAlignType.Center;
break;
case HorizontalAlignment.Right:
cell.Style.HorizontalAlignment = HorizontalAlignType.Right;
break;
}
}
}
}
Conclusion
As demonstrated in this article, using Spire.Doc for .NET and Spire.XLS for .NET , we can efficiently implement Word to Excel conversion using C#, while during the conversion process:
- Preserving text content and order
- Restoring font styles and alignments
- Fully exporting table structures
- Supporting image copying
This solution is very suitable for invoice conversion, report organization, and document data structuring scenarios. If you are developing document automation or enterprise-level office systems, this Word to Excel implementation will have significant practical value.
Top comments (0)