DEV Community

Allen Yang
Allen Yang

Posted on

Batch Extract Word Tables to Excel with Python

Easily Convert Word Tables to Excel Sheets with Python

In daily work, we often need to handle data in various formats. Word documents, known for their flexible layout capabilities, are commonly used for writing reports and documentation. However, when these documents contain a large number of tables, manually copying and pasting the data into Excel for further analysis or statistics becomes a time-consuming and error-prone task. Imagine dealing with tables across dozens or even hundreds of Word files—this repetitive work is inefficient and frustrating.

Fortunately, Python, as a powerful automation tool, can effectively solve this challenge. In this article, we will explore how to use Python together with Spire.Doc for Python and Spire.XLS for Python to efficiently and accurately extract table data from Word documents and convert it into editable Excel spreadsheets. By automating this process, you can significantly improve productivity, reduce human error, and focus on generating valuable data insights.


Environment Setup and Library Installation

Before getting started, make sure your Python environment is properly configured and that the required libraries are installed. This tutorial assumes you are running Python 3.x.

First, open your command-line tool (such as CMD, PowerShell, or Terminal) and install Spire.Doc for Python and Spire.XLS for Python using the pip command. These two libraries are essential for this task: Spire.Doc for Python is responsible for reading and parsing Word documents—especially for identifying and extracting table data—while Spire.XLS for Python is used to create, write, and save Excel files.

pip install Spire.Doc
pip install Spire.XLS
Enter fullscreen mode Exit fullscreen mode

Once installed, you can import and use them in your Python scripts.


Reading Word Tables and Extracting Data

Data extraction is the key step in the entire conversion process. We will use Spire.Doc for Python to load a Word document and iterate through all tables to extract their contents one by one.

Here is an example Word document:

Sample Word Document with Tables

Next, we will write code to identify and extract the data.

from spire.doc import *
from spire.doc.common import *


def extract_tables_from_word(word_file_path):
    """
    Extract all table data from a Word document.
    Returns a list where each element represents a table.
    Each table contains a list of rows, and each row contains a list of cell values.
    """
    document = Document()
    document.LoadFromFile(word_file_path)

    all_tables_data = []
    # Iterate through all sections in the document
    for sec_index in range(document.Sections.Count):
        section = document.Sections.get_Item(sec_index)
        # Iterate through all tables in the section
        for table_index in range(section.Tables.Count):
            table = section.Tables.get_Item(table_index)
            current_table_data = []
            # Iterate through all rows in the table
            for row_index in range(table.Rows.Count):
                table_row = table.Rows.get_Item(row_index)
                current_row_data = []
                # Iterate through all cells in the row
                for cell_index in range(table_row.Cells.Count):
                    table_cell = table_row.Cells.get_Item(cell_index)
                    # Extract cell text while preserving the original paragraph structure
                    paras = [table_cell.Paragraphs.get_Item(i).Text.rstrip('\r\n')
                             for i in range(table_cell.Paragraphs.Count)
                             if table_cell.Paragraphs.get_Item(i).Text.strip()]
                    current_cell_data = "\n".join(paras)
                    current_row_data.append(current_cell_data)
                current_table_data.append(current_row_data)
            all_tables_data.append(current_table_data)

    document.Close()
    return all_tables_data


# Assume your Word file is named 'input.docx' and located in the current directory
word_file = "input.docx"
extracted_data = extract_tables_from_word(word_file)

# Print the extracted data for verification
for i, table_data in enumerate(extracted_data):
    print(f"--- Table {i + 1} Data ---")
    for row in table_data:
        print(row)
Enter fullscreen mode Exit fullscreen mode

Console output:

Console Output of Extracted Word Tables

Code Explanation:

  1. Document() creates an instance used to load the Word document.
  2. document.LoadFromFile() loads the document from the specified path.
  3. We iterate through all sections using document.Sections, and then iterate through the tables in each section via section.Tables.
  4. For each table, we further iterate through table.Rows to get rows, and then row.Cells to access individual cells.
  5. cell.Text.strip() retrieves the plain text content of a cell and removes any extra whitespace.
  6. All extracted data is stored in a nested list. all_tables_data contains every table, each table contains rows, and each row contains cell values.

Writing Data to Excel and Saving the File

After extracting the data, the next step is to write it into an Excel file. We will use Spire.XLS for Python to create a new workbook and populate worksheets with the extracted data.

from spire.xls import *
from spire.xls.common import *


def write_data_to_excel(extracted_data, excel_file_path):
    """
    Write the extracted table data to an Excel file.
    Each Word table will be written to a new worksheet.
    """
    workbook = Workbook()
    # Clear the default worksheet
    workbook.Worksheets.Clear()

    # Do not create an Excel file if no data was extracted
    if not extracted_data:
        print("No table data was extracted from the Word document.")
        return

    # Iterate through all extracted tables
    for i, table_data in enumerate(extracted_data):
        # Create a new worksheet for each table
        sheet = workbook.Worksheets.Add(f"Table_{i + 1}")

        # Write table data into the worksheet
        for r_idx, row_data in enumerate(table_data):
            for c_idx, cell_value in enumerate(row_data):
                # Excel row and column indexes start at 1
                sheet.Range[r_idx + 1, c_idx + 1].Value = cell_value

        # (Optional) Apply basic formatting
        # For example, auto-fit column widths
        sheet.AllocatedRange.AutoFitColumns()

    # Save the Excel file
    workbook.SaveToFile(excel_file_path, ExcelVersion.Version2016)
    workbook.Dispose()
    print(f"Data has been successfully written to {excel_file_path}")

# Call the function to write data to Excel
excel_file = "output.xlsx"
write_data_to_excel(extracted_data, excel_file)
Enter fullscreen mode Exit fullscreen mode

Result after writing:

Writing Word Tables to Excel Worksheets

Code Explanation:

  1. Workbook() creates a new Excel workbook.
  2. We iterate through each Word table stored in extracted_data.
  3. A new worksheet is created for each table using workbook.Worksheets.Add().
  4. We then loop through rows and cell values, writing them into Excel using sheet.Range[r_idx + 1, c_idx + 1].Value = cell_value. Remember that Excel indexes start at 1, so we add +1.
  5. workbook.SaveToFile() saves the workbook to the specified Excel file, while ExcelVersion.Version2016 defines the output format.

By combining the two Python code sections above, you will have a complete automated script for converting Word tables into Excel spreadsheets.


Conclusion and Future Outlook

In this tutorial, we demonstrated how to use Python with Spire.Doc for Python and Spire.XLS for Python to automate the conversion of table data from Word documents into Excel spreadsheets. This approach eliminates the need for tedious manual copying and pasting, significantly improves data processing efficiency, and minimizes the risk of human error.

This type of automation has broad application potential across many scenarios, such as:

  • Report Data Consolidation – Extract key data from multiple Word reports and compile it into Excel for analysis.
  • Enterprise Data Migration – Batch import structured data from legacy Word documents into new databases or systems.
  • Office Workflow Automation – Simplify repetitive data entry and format conversion tasks, allowing you to focus more on core business activities.

Top comments (0)