DEV Community

Leon Davis
Leon Davis

Posted on

How to Merge Excel Files into One Using Python

When working with multiple Excel files, you may need to merge them into one file for easier analysis or reporting. This is a common task for data analysts, business professionals, and anyone who handles large amounts of Excel data. Instead of manually copying and pasting data from different sheets, you can automate the process using Python.

In this guide, we'll show you two methods for merging Excel files into one using Python. These methods will help you streamline the process, saving you time and effort, especially when dealing with large datasets.

Prerequisites

To get started, you'll need to install the Spire.XLS for Python library. This library allows you to interact with Excel files, read data, and write data without needing Excel installed on your machine. It’s useful for tasks like merging files, generating reports, and automating Excel-related operations.

To install Spire.XLS for Python, run the following command:

pip install spire.xls
Enter fullscreen mode Exit fullscreen mode

Once installed, you can start working with Excel files in Python.

Method 1: Merge Excel Files into One Workbook (Multiple Sheets)

In this method, we will merge multiple Excel files into a single workbook, preserving the original worksheets from each file. This is useful when you want to keep the structure of the original files intact, but still combine them into one file.

Code Example:

import os
from spire.xls import *

# Folder containing Excel files to merge
input_folder = './sample_files'
# Output file name for the merged workbook
output_file = 'merged_workbook.xlsx'

# Initialize merged workbook as None
merged_workbook = None

# Iterate over all files in the input folder
for filename in os.listdir(input_folder):
    # Process only Excel files with .xls or .xlsx extensions
    if filename.endswith('.xlsx') or filename.endswith('.xls'):
        file_path = os.path.join(input_folder, filename)

        # Load the current Excel file into a Workbook object
        source_workbook = Workbook()
        source_workbook.LoadFromFile(file_path)

        if merged_workbook is None:
            # For the first file, assign it as the base merged workbook
            merged_workbook = source_workbook
        else:
            # For subsequent files, copy each worksheet into the merged workbook
            for i in range(source_workbook.Worksheets.Count):
                sheet = source_workbook.Worksheets.get_Item(i)
                merged_workbook.Worksheets.AddCopy(sheet, WorksheetCopyType.CopyAll)

# Save the combined workbook to the specified output file
merged_workbook.SaveToFile(output_file, ExcelVersion.Version2016)
Enter fullscreen mode Exit fullscreen mode

How It Works:

  1. Set Folder Path : We specify the folder (input_folder) where your Excel files are stored.
  2. Iterate Over Files : The script goes through each file in the folder and checks if it has a .xls or .xlsx extension.
  3. Load Workbooks : For each file, it loads the data into a Workbook object.
  4. Merge Worksheets : The first file initializes the merged_workbook, and subsequent files are added with their worksheets.
  5. Save the Output : Finally, the merged workbook is saved to the specified output file.

The output will be a single Excel file, merged_workbook.xlsx, containing all the worksheets from each of the Excel files.

Method 2: Merge Excel Files into One Worksheet

In the second method, we merge multiple Excel files into a single worksheet. This approach is ideal when you want to combine data from different sheets into one sheet, stacking the data from each file below one another.

Code Example:

import os
from spire.xls import *

# Folder containing Excel files to merge
input_folder = './excel_worksheets'
# Output file name for the merged workbook
output_file = 'merged_into_one_sheet.xlsx'

# Create a new workbook to hold merged data
merged_workbook = Workbook()
# Use the first worksheet in the new workbook as the merge target
merged_sheet = merged_workbook.Worksheets[0]

# Initialize the starting row for copying data
current_row = 1

# Loop through all files in the input folder
for filename in os.listdir(input_folder):
    # Process only Excel files (.xls or .xlsx)
    if filename.endswith('.xlsx') or filename.endswith('.xls'):
        file_path = os.path.join(input_folder, filename)

        # Load the current Excel file
        workbook = Workbook()
        workbook.LoadFromFile(file_path)

        # Get the first worksheet from the current workbook
        sheet = workbook.Worksheets[0]

        # Get the used range from the first worksheet
        source_range = sheet.Range

        # Set the destination range in the merged worksheet starting at current_row
        dest_range = merged_sheet.Range[current_row, 1]

        # Copy data from the used range to the destination range
        source_range.Copy(dest_range)

        # Update current_row to the row after the last copied row to prevent overlap
        current_row += sheet.LastRow

# Save the merged workbook to the specified output file in Excel 2016 format
merged_workbook.SaveToFile(output_file, ExcelVersion.Version2016)
Enter fullscreen mode Exit fullscreen mode

How It Works:

  1. Initialize Workbook : A new workbook is created to hold all the merged data.
  2. Iterate Over Files : Like the previous method, the script loops over all Excel files in the specified folder.
  3. Copy Data : For each file, it copies the data from the first worksheet and appends it to the destination worksheet.
  4. Update Row Index : It keeps track of the row index (current_row) to ensure that the data from each file is copied without overlapping.
  5. Save the Output : The merged data is saved into a new file, merged_into_one_sheet.xlsx.

The result will be a single worksheet containing all the combined data from the various Excel files, stacked below one another.

Conclusion

Merging Excel files into a single file can be a tedious task, especially if you have many files to work with. By using Python and the Spire.XLS library, you can automate this process to save time and effort.

  • Method 1 is useful when you want to preserve the structure of the original files and keep each worksheet in separate tabs.
  • Method 2 is ideal for combining data into one worksheet, making it easier to aggregate or analyze the information.

Both methods can be customized depending on your needs, and once set up, they can save you a significant amount of time when working with multiple Excel files.

Top comments (0)