In daily work, we often need to handle data in various formats. Word documents, known for their flexible layout capabilities, are commonly used for writing reports and documentation. However, when these documents contain a large number of tables, manually copying and pasting the data into Excel for further analysis or statistics becomes a time-consuming and error-prone task. Imagine dealing with tables across dozens or even hundreds of Word files—this repetitive work is inefficient and frustrating.
Fortunately, Python, as a powerful automation tool, can effectively solve this challenge. In this article, we will explore how to use Python together with Spire.Doc for Python and Spire.XLS for Python to efficiently and accurately extract table data from Word documents and convert it into editable Excel spreadsheets. By automating this process, you can significantly improve productivity, reduce human error, and focus on generating valuable data insights.
Environment Setup and Library Installation
Before getting started, make sure your Python environment is properly configured and that the required libraries are installed. This tutorial assumes you are running Python 3.x.
First, open your command-line tool (such as CMD, PowerShell, or Terminal) and install Spire.Doc for Python and Spire.XLS for Python using the pip command. These two libraries are essential for this task: Spire.Doc for Python is responsible for reading and parsing Word documents—especially for identifying and extracting table data—while Spire.XLS for Python is used to create, write, and save Excel files.
pip install Spire.Doc
pip install Spire.XLS
Once installed, you can import and use them in your Python scripts.
Reading Word Tables and Extracting Data
Data extraction is the key step in the entire conversion process. We will use Spire.Doc for Python to load a Word document and iterate through all tables to extract their contents one by one.
Here is an example Word document:
Next, we will write code to identify and extract the data.
from spire.doc import *
from spire.doc.common import *
def extract_tables_from_word(word_file_path):
"""
Extract all table data from a Word document.
Returns a list where each element represents a table.
Each table contains a list of rows, and each row contains a list of cell values.
"""
document = Document()
document.LoadFromFile(word_file_path)
all_tables_data = []
# Iterate through all sections in the document
for sec_index in range(document.Sections.Count):
section = document.Sections.get_Item(sec_index)
# Iterate through all tables in the section
for table_index in range(section.Tables.Count):
table = section.Tables.get_Item(table_index)
current_table_data = []
# Iterate through all rows in the table
for row_index in range(table.Rows.Count):
table_row = table.Rows.get_Item(row_index)
current_row_data = []
# Iterate through all cells in the row
for cell_index in range(table_row.Cells.Count):
table_cell = table_row.Cells.get_Item(cell_index)
# Extract cell text while preserving the original paragraph structure
paras = [table_cell.Paragraphs.get_Item(i).Text.rstrip('\r\n')
for i in range(table_cell.Paragraphs.Count)
if table_cell.Paragraphs.get_Item(i).Text.strip()]
current_cell_data = "\n".join(paras)
current_row_data.append(current_cell_data)
current_table_data.append(current_row_data)
all_tables_data.append(current_table_data)
document.Close()
return all_tables_data
# Assume your Word file is named 'input.docx' and located in the current directory
word_file = "input.docx"
extracted_data = extract_tables_from_word(word_file)
# Print the extracted data for verification
for i, table_data in enumerate(extracted_data):
print(f"--- Table {i + 1} Data ---")
for row in table_data:
print(row)
Console output:
Code Explanation:
-
Document()creates an instance used to load the Word document. -
document.LoadFromFile()loads the document from the specified path. - We iterate through all sections using
document.Sections, and then iterate through the tables in each section viasection.Tables. - For each table, we further iterate through
table.Rowsto get rows, and thenrow.Cellsto access individual cells. -
cell.Text.strip()retrieves the plain text content of a cell and removes any extra whitespace. - All extracted data is stored in a nested list.
all_tables_datacontains every table, each table contains rows, and each row contains cell values.
Writing Data to Excel and Saving the File
After extracting the data, the next step is to write it into an Excel file. We will use Spire.XLS for Python to create a new workbook and populate worksheets with the extracted data.
from spire.xls import *
from spire.xls.common import *
def write_data_to_excel(extracted_data, excel_file_path):
"""
Write the extracted table data to an Excel file.
Each Word table will be written to a new worksheet.
"""
workbook = Workbook()
# Clear the default worksheet
workbook.Worksheets.Clear()
# Do not create an Excel file if no data was extracted
if not extracted_data:
print("No table data was extracted from the Word document.")
return
# Iterate through all extracted tables
for i, table_data in enumerate(extracted_data):
# Create a new worksheet for each table
sheet = workbook.Worksheets.Add(f"Table_{i + 1}")
# Write table data into the worksheet
for r_idx, row_data in enumerate(table_data):
for c_idx, cell_value in enumerate(row_data):
# Excel row and column indexes start at 1
sheet.Range[r_idx + 1, c_idx + 1].Value = cell_value
# (Optional) Apply basic formatting
# For example, auto-fit column widths
sheet.AllocatedRange.AutoFitColumns()
# Save the Excel file
workbook.SaveToFile(excel_file_path, ExcelVersion.Version2016)
workbook.Dispose()
print(f"Data has been successfully written to {excel_file_path}")
# Call the function to write data to Excel
excel_file = "output.xlsx"
write_data_to_excel(extracted_data, excel_file)
Result after writing:
Code Explanation:
-
Workbook()creates a new Excel workbook. - We iterate through each Word table stored in
extracted_data. - A new worksheet is created for each table using
workbook.Worksheets.Add(). - We then loop through rows and cell values, writing them into Excel using
sheet.Range[r_idx + 1, c_idx + 1].Value = cell_value. Remember that Excel indexes start at 1, so we add+1. -
workbook.SaveToFile()saves the workbook to the specified Excel file, whileExcelVersion.Version2016defines the output format.
By combining the two Python code sections above, you will have a complete automated script for converting Word tables into Excel spreadsheets.
Conclusion and Future Outlook
In this tutorial, we demonstrated how to use Python with Spire.Doc for Python and Spire.XLS for Python to automate the conversion of table data from Word documents into Excel spreadsheets. This approach eliminates the need for tedious manual copying and pasting, significantly improves data processing efficiency, and minimizes the risk of human error.
This type of automation has broad application potential across many scenarios, such as:
- Report Data Consolidation – Extract key data from multiple Word reports and compile it into Excel for analysis.
- Enterprise Data Migration – Batch import structured data from legacy Word documents into new databases or systems.
- Office Workflow Automation – Simplify repetitive data entry and format conversion tasks, allowing you to focus more on core business activities.




Top comments (0)